Skip to content

Commit 0e32df6

Browse files
committed
add function vchordrq_evaluate_query_recall
Signed-off-by: cutecutecat <[email protected]>
1 parent b055c4b commit 0e32df6

File tree

3 files changed

+58
-1
lines changed

3 files changed

+58
-1
lines changed

.vitepress/config.mts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,7 @@ export default defineConfig({
195195
{ text: 'Similarity Filter', link: '/vectorchord/usage/range-query' },
196196
{ text: 'PostgreSQL Tuning', link: '/vectorchord/usage/performance-tuning' },
197197
{ text: 'Monitoring', link: '/vectorchord/usage/monitoring' },
198+
{ text: 'Measure Recall', link: '/vectorchord/usage/measure-recall' },
198199
{ text: 'Prewarm', link: '/vectorchord/usage/prewarm' },
199200
{ text: 'Prefilter', link: '/vectorchord/usage/prefilter' },
200201
{ text: 'Prefetch', link: '/vectorchord/usage/prefetch' },

src/vectorchord/usage/indexing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,7 @@ The operator classes for `MaxSim` are available since version `0.3.0`.
211211
- `build.internal.build_threads = 1` means that the K-means algorithm uses $1$ thread.
212212
- `build.internal.build_threads = 4` means that the K-means algorithm uses $4$ threads.
213213

214-
### Search Parameters <badge type="info" text="vchordrq" />
214+
### Search Parameters <badge type="info" text="vchordrq" /> {#search-parameters}
215215

216216
#### `vchordrq.probes`
217217

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Measure Recall
2+
3+
In the world of vector search, recall refers to the percentage of vectors that the index returns which are true nearest neighbors. For example, if a nearest neighbor query for the 200 nearest neighbors returns 194 of the ground truth nearest neighbors, then the recall is 194/200 x 100 = 97%.
4+
5+
In a vector query, recall is important because it measures the percentage of relevant results retrieved from a search. Recall helps you evaluate the quality of a vector index and provides insight into balancing search speed and accuracy.
6+
7+
With VectorChord, you can find the recall for a vector query on a vector index for any SQL query. You can easily tune the [search parameters](indexing#search-parameters) to achieve the desired search recall.
8+
9+
::: code-group
10+
11+
```sql [vchordrq <badge type="tip" text="since v0.5.0" />]
12+
-- You can tune the search parameters before measure
13+
-- SET vchordrq.probes = '100'
14+
-- SET vchordrq.epsilon = 1.0
15+
16+
SELECT vchordrq_evaluate_query_recall(query=>$$
17+
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 10
18+
$$);
19+
```
20+
21+
```sql [vchordg <badge type="tip" text="since v0.5.0" />]
22+
-- You can tune the search parameters before measure
23+
-- SET vchordrq.probes = '100'
24+
-- SET vchordrq.epsilon = 1.0
25+
26+
-- Fast evaluate for vchordg is not implemented yet, it's a workaround to use
27+
SELECT vchordrq_evaluate_query_recall(query=>$$
28+
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 10
29+
$$, exact_search=>true);
30+
```
31+
32+
:::
33+
34+
::: details
35+
36+
By default, the function uses `exact_search=>false` to generate a **slightly inaccurate** ground truth by setting `vchordrq.probes` to an extremely large value (65535). This method is much faster than using `exact_search=true` for a full table scan, and the resulting precision is acceptable in most situations.
37+
38+
:::
39+
40+
## Reference
41+
42+
### Functions <badge type="info" text="vchordrq" />
43+
44+
#### `vchordrq_evaluate_query_recall`
45+
46+
- Description: Evaluates the recall of a given SQL query.
47+
- Result: `real` (a value between 0.0 and 1.0, or NaN if no results are found)
48+
- Arguments:
49+
- `query`(text): The SQL query to be evaluated.
50+
- `exact_search`(boolean): A flag to indicate whether an full table scan should be performed for the ground truth set. The default value is false.
51+
- `accu_probes`(text): Used when `exact_search` is false. It specifies the `vchordrq.probes` value for the ANN search that generates the estimated ground truth. If NULL, it will be derived from the active `vchordrq.probes` setting during the initial query execution.
52+
- `accu_epsilon`(real): Used when `exact_search` is false. It specifies the `vchordrq.epsilon` value for the ANN search that generates the estimated ground truth. The default value is 1.9.
53+
- Example:
54+
- `SELECT vchordrq_evaluate_query_recall(query=>$$SELECT ctid FROM t ORDER BY val <-> '[0.5, 0.25, 1.0]' LIMIT 10$$);`
55+
- `SELECT vchordrq_evaluate_query_recall(query=>$$SELECT ctid FROM t ORDER BY val <-> '[0.5, 0.25, 1.0]' LIMIT 10$$, exact_search=>true);`
56+
- `SELECT vchordrq_evaluate_query_recall(query=>$$SELECT ctid FROM t ORDER BY val <-> '[0.5, 0.25, 1.0]' LIMIT 10$$, exact_search=>true, accu_probes=>'100', accu_epsilon=>3.9);`

0 commit comments

Comments
 (0)