|
| 1 | +# Measure Recall |
| 2 | + |
| 3 | +In the world of vector search, recall refers to the percentage of vectors that the index returns which are true nearest neighbors. For example, if a nearest neighbor query for the 200 nearest neighbors returns 194 of the ground truth nearest neighbors, then the recall is 194/200 x 100 = 97%. |
| 4 | + |
| 5 | +In a vector query, recall is important because it measures the percentage of relevant results retrieved from a search. Recall helps you evaluate the quality of a vector index and provides insight into balancing search speed and accuracy. |
| 6 | + |
| 7 | +With VectorChord, you can find the recall for a vector query on a vector index for any SQL query. You can easily tune the [search parameters](indexing#search-parameters) to achieve the desired search recall. |
| 8 | + |
| 9 | +::: code-group |
| 10 | + |
| 11 | +```sql [vchordrq <badge type="tip" text="since v0.5.0" />] |
| 12 | +-- You can tune the search parameters before measure |
| 13 | +-- SET vchordrq.probes = '100' |
| 14 | +-- SET vchordrq.epsilon = 1.0 |
| 15 | + |
| 16 | +SELECT vchordrq_evaluate_query_recall(query=>$$ |
| 17 | + SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 10 |
| 18 | +$$); |
| 19 | +``` |
| 20 | + |
| 21 | +```sql [vchordg <badge type="tip" text="since v0.5.0" />] |
| 22 | +-- You can tune the search parameters before measure |
| 23 | +-- SET vchordrq.probes = '100' |
| 24 | +-- SET vchordrq.epsilon = 1.0 |
| 25 | + |
| 26 | +-- Fast evaluate for vchordg is not implemented yet, it's a workaround to use |
| 27 | +SELECT vchordrq_evaluate_query_recall(query=>$$ |
| 28 | + SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 10 |
| 29 | +$$, exact_search=>true); |
| 30 | +``` |
| 31 | + |
| 32 | +::: |
| 33 | + |
| 34 | +::: details |
| 35 | + |
| 36 | +By default, the function uses `exact_search=>false` to generate a **slightly inaccurate** ground truth by setting `vchordrq.probes` to an extremely large value (65535). This method is much faster than using `exact_search=true` for a full table scan, and the resulting precision is acceptable in most situations. |
| 37 | + |
| 38 | +::: |
| 39 | + |
| 40 | +## Reference |
| 41 | + |
| 42 | +### Functions <badge type="info" text="vchordrq" /> |
| 43 | + |
| 44 | +#### `vchordrq_evaluate_query_recall` |
| 45 | + |
| 46 | +- Description: Evaluates the recall of a given SQL query. |
| 47 | +- Result: `real` (a value between 0.0 and 1.0, or NaN if no results are found) |
| 48 | +- Arguments: |
| 49 | + - `query`(text): The SQL query to be evaluated. |
| 50 | + - `exact_search`(boolean): A flag to indicate whether an full table scan should be performed for the ground truth set. The default value is false. |
| 51 | + - `accu_probes`(text): Used when `exact_search` is false. It specifies the `vchordrq.probes` value for the ANN search that generates the estimated ground truth. If NULL, it will be derived from the active `vchordrq.probes` setting during the initial query execution. |
| 52 | + - `accu_epsilon`(real): Used when `exact_search` is false. It specifies the `vchordrq.epsilon` value for the ANN search that generates the estimated ground truth. The default value is 1.9. |
| 53 | +- Example: |
| 54 | + - `SELECT vchordrq_evaluate_query_recall(query=>$$SELECT ctid FROM t ORDER BY val <-> '[0.5, 0.25, 1.0]' LIMIT 10$$);` |
| 55 | + - `SELECT vchordrq_evaluate_query_recall(query=>$$SELECT ctid FROM t ORDER BY val <-> '[0.5, 0.25, 1.0]' LIMIT 10$$, exact_search=>true);` |
| 56 | + - `SELECT vchordrq_evaluate_query_recall(query=>$$SELECT ctid FROM t ORDER BY val <-> '[0.5, 0.25, 1.0]' LIMIT 10$$, exact_search=>true, accu_probes=>'100', accu_epsilon=>3.9);` |
0 commit comments