Skip to content

Commit 7c6c630

Browse files
committed
add vchordg
Signed-off-by: usamoi <[email protected]>
1 parent 52c714b commit 7c6c630

File tree

12 files changed

+193
-0
lines changed

12 files changed

+193
-0
lines changed

.vitepress/config.mts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,7 @@ export default defineConfig({
191191
items: [
192192
{ text: 'Indexing', link: '/vectorchord/usage/indexing' },
193193
{ text: 'Multi-Vector Retrieval', link: '/vectorchord/usage/indexing-with-maxsim-operators' },
194+
{ text: 'Graph Index', link: '/vectorchord/usage/graph-index' },
194195
{ text: 'Similarity Filter', link: '/vectorchord/usage/range-query' },
195196
{ text: 'PostgreSQL Tuning', link: '/vectorchord/usage/performance-tuning' },
196197
{ text: 'Monitoring', link: '/vectorchord/usage/monitoring' },

src/vectorchord/getting-started/overview.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ For more usage, please read:
7878

7979
- [Indexing](/vectorchord/usage/indexing)
8080
- [Multi-Vector Retrieval](/vectorchord/usage/indexing-with-maxsim-operators)
81+
- [Graph Index](/vectorchord/usage/graph-index)
8182
- [Similarity Filter](/vectorchord/usage/range-query)
8283
- [PostgreSQL Tuning](/vectorchord/usage/performance-tuning)
8384
- [Monitoring](/vectorchord/usage/monitoring)

src/vectorchord/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010

1111
- [Indexing](/vectorchord/usage/indexing)
1212
- [Multi-Vector Retrieval](/vectorchord/usage/indexing-with-maxsim-operators)
13+
- [Graph Index](/vectorchord/usage/graph-index)
1314
- [Similarity Filter](/vectorchord/usage/range-query)
1415
- [PostgreSQL Tuning](/vectorchord/usage/performance-tuning)
1516
- [Monitoring](/vectorchord/usage/monitoring)

src/vectorchord/usage/external-build.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,5 @@ $$);
3232
```
3333

3434
To simplify the workflow, we provide end-to-end scripts for external index pre-computation, refer to [Run External Index Precomputation Toolkit](https://github.com/tensorchord/VectorChord/tree/main/scripts#run-external-index-precomputation-toolkit).
35+
36+
This feature is not supported by `vchordg`, since this step does not exist in it.

src/vectorchord/usage/graph-index.md

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# Graph Index <badge type="tip" text="since v0.5.0" />
2+
3+
VectorChord's index type `vchordg` is a disk-based graph index. It provides slow build time and low memory consumption.
4+
5+
To build a vector index, start by creating a table named `items` with an `embedding` column of type `vector(n)`, then populate it with sample data.
6+
7+
To create a `vchordg` index, you can use the following SQL.
8+
9+
```sql
10+
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops);
11+
```
12+
13+
## Tuning
14+
15+
When building an index, you usually need to tune two parameters: `m` and `ef_construction`. `m` is the maximum number of neighbors for each vertex, and `ef_construction` is the search range when building the graph for each vertex. `m` corresponds to $m_0$ in HNSW and $m$ in DiskANN. `ef_construction` corresponds to $\text{ef}_\text{construction}$ in HNSW and $\text{ef}_C$ in DiskANN. In search, you need to tune `ef_search`. `ef_search` corresponds to $\text{ef}$ in HNSW and DiskANN.
16+
17+
```sql
18+
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops) WITH (options = $$
19+
m = 64
20+
ef_construction = 128
21+
$$);
22+
23+
SET vchordg.ef_search TO '128';
24+
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 10;
25+
```
26+
27+
As a disk-based index, `vchordg` usually only requires the quantized vectors to be in the buffer pool to maintain performance. By default, `vchordg` quantizes a $D$-dimensional vector to $2D$ bits. Let the number of rows be $N$, then the total memory required for the index is $2DN$ bits. If you have very limited memory and are using ultra-high dimensional vectors, you can consider setting quantization to $1$ bit.
28+
29+
```sql
30+
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops) WITH (options = $$
31+
bits = 1
32+
m = 64
33+
ef_construction = 128
34+
$$);
35+
```
36+
37+
The index building can be sped up using multiple processes. Refer to [PostgreSQL Tuning](performance-tuning.md#indexing).
38+
39+
## Reference
40+
41+
### Operator Classes <badge type="info" text="vchordg" /> {#operator-classes}
42+
43+
The following table lists all available operator classes supported by `vchordg`.
44+
45+
| Operator Class | Description | Operator 1 | Operator 2 |
46+
| -------------------- | --------------------------------------------------------- | ---------------------- | ------------------------ |
47+
| `vector_l2_ops` | index works for `vector` type and Euclidean distance | `<->(vector,vector)` | `<<->>(vector,vector)` |
48+
| `vector_ip_ops` | index works for `vector` type and negative inner product | `<#>(vector,vector)` | `<<#>>(vector,vector)` |
49+
| `vector_cosine_ops` | index works for `vector` type and cosine distance | `<=>(vector,vector)` | `<<=>>(vector,vector)` |
50+
| `halfvec_l2_ops` | index works for `halfvec` type and Euclidean distance | `<->(halfvec,halfvec)` | `<<->>(halfvec,halfvec)` |
51+
| `halfvec_ip_ops` | index works for `halfvec` type and negative inner product | `<#>(halfvec,halfvec)` | `<<#>>(halfvec,halfvec)` |
52+
| `halfvec_cosine_ops` | index works for `halfvec` type and cosine distance | `<=>(halfvec,halfvec)` | `<<=>>(halfvec,halfvec)` |
53+
54+
`<<->>`, `<<#>>`, `<<=>>` are operators defined by VectorChord.
55+
56+
For more information about `<<->>`, `<<#>>`, `<<=>>`, refer to [Similarity Filter](range-query).
57+
58+
All operator classes are available since version `0.3.0`.
59+
60+
### Indexing Options <badge type="info" text="vchordg" />
61+
62+
#### `bits` <badge type="tip" text="since v0.5.0" />
63+
64+
- Description: .
65+
- Type: integer
66+
- Default: `2`
67+
- Example:
68+
- `bits = 2` means that a $D$-dimensional vector is quantized to $D$ bits.
69+
- `bits = 1` means that a $D$-dimensional vector is quantized to $2D$ bits .
70+
71+
#### `m` <badge type="tip" text="since v0.5.0" />
72+
73+
- Description: .
74+
- Type: integer
75+
- Default: `32`
76+
- Example:
77+
- `m = 32` means that there are at most $32$ neighbors for each vertex.
78+
- `m = 64` means that there are at most $64$ neighbors for each vertex.
79+
80+
#### `alpha` <badge type="tip" text="since v0.5.0" />
81+
82+
- Description: .
83+
- Type: list of floats
84+
- Default: `[1.0, 1.2]`
85+
- Example:
86+
- `alpha = [1.0, 1.2]` means that .
87+
- `alpha = [1.0]` means that .
88+
89+
#### `ef_construction` <badge type="tip" text="since v0.5.0" />
90+
91+
- Description: .
92+
- Type: integer
93+
- Default: `64`
94+
- Example:
95+
- `ef_construction = 64` means that .
96+
- `ef_construction = 128` means that .
97+
98+
#### `beam_construction` <badge type="tip" text="since v0.5.0" />
99+
100+
- Description: .
101+
- Type: integer
102+
- Default: `1`
103+
- Example:
104+
- `beam_construction = 8` means that .
105+
- `beam_construction = 1` means that .
106+
107+
### Search Parameters <badge type="info" text="vchordg" />
108+
109+
#### `vchordg.ef_search` <badge type="tip" text="since v0.5.0" />
110+
111+
- Description: .
112+
- Type: integer
113+
- Default: `64`
114+
- Domain: `[1, 65535]`
115+
- Example:
116+
- `SET vchordg.ef_search = 64` indicates .
117+
- `SET vchordg.ef_search = 128` indicates .
118+
119+
#### `vchordg.beam_search` <badge type="tip" text="since v0.5.0" />
120+
121+
- Description: .
122+
- Type: integer
123+
- Default: `64`
124+
- Domain: `[1, 65535]`
125+
- Example:
126+
- `SET vchordg.beam_search = 8` indicates .
127+
- `SET vchordg.beam_search = 1` indicates .

src/vectorchord/usage/monitoring.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,14 @@ There are 5 steps in the index construction process of `vchordrq`. The phase nam
2222
| 5 | `compacting tuples in index` | Optimize the structure of the vector index to enhance performance | Medium |
2323

2424
The 4th step, `inserting tuples from table to index`, takes up the majority of the time during index construction. The `tuples_done`, `blocks_done` and `blocks_total` columns indicate the progress of this step.
25+
26+
### Phases <badge type="info" text="vchordg" />
27+
28+
There are 2 steps in the index construction process of `vchordg`. The phase name of each step are as follows:
29+
30+
| Step | Message | Description | Waiting time |
31+
| ---- | -------------------------------------- | --------------------------------- | ------------ |
32+
| 1 | `initializing` | Start building index | Short |
33+
| 2 | `inserting tuples from table to index` | Insert all vectors into the graph | Long |
34+
35+
The 2nd step, `inserting tuples from table to index`, takes up the majority of the time during index construction. The `tuples_done`, `blocks_done` and `blocks_total` columns indicate the progress of this step.

src/vectorchord/usage/multi-vector-retrieval.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@ ORDER BY embeddings @#
4646

4747
## Reference
4848

49+
This feature is not supported by `vchordg`.
50+
4951
### Operator Classes
5052

5153
Refer to

src/vectorchord/usage/prefetch.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,35 @@ Furthermore, the [Asynchronous I/O](https://pganalyze.com/blog/postgres-18-async
5555
- `read_buffer` indicates a preference for `ReadBuffer`.
5656
- `prefetch_buffer` indicates a preference for both `PrefetchBuffer` and `ReadBuffer`.
5757
- `read_stream` indicates a preference for `read_stream`.
58+
59+
### Search Parameters <badge type="info" text="vchordg" />
60+
61+
#### `vchordg.io_search` <badge type="tip" text="since v0.5.0" />
62+
63+
- Description: This GUC parameter controls the I/O prefetching strategy for reading bit vectors in vector search, which can impact search performance on disk-based vectors.
64+
- Type: string
65+
- Domain: Depends on PostgreSQL version
66+
- PostgreSQL 13, 14, 15, 16: `{"read_buffer", "prefetch_buffer"}`
67+
- PostgreSQL 17: `{"read_buffer", "prefetch_buffer", "read_stream"}`
68+
- Default: Depends on PostgreSQL version
69+
- PostgreSQL 13, 14, 15, 16: `prefetch_buffer`
70+
- PostgreSQL 17: `read_stream`
71+
- Note:
72+
- `read_buffer` indicates a preference for `ReadBuffer`.
73+
- `prefetch_buffer` indicates a preference for both `PrefetchBuffer` and `ReadBuffer`.
74+
- `read_stream` indicates a preference for `read_stream`.
75+
76+
#### `vchordg.io_rerank` <badge type="tip" text="since v0.5.0" />
77+
78+
- Description: This GUC parameter controls the I/O prefetching strategy for reading full precision vectors in vector search, which can significantly impact search performance on disk-based vectors.
79+
- Type: string
80+
- Domain: Depends on PostgreSQL version
81+
- PostgreSQL 13, 14, 15, 16: `{"read_buffer", "prefetch_buffer"}`
82+
- PostgreSQL 17: `{"read_buffer", "prefetch_buffer", "read_stream"}`
83+
- Default: Depends on PostgreSQL version
84+
- PostgreSQL 13, 14, 15, 16: `prefetch_buffer`
85+
- PostgreSQL 17: `read_stream`
86+
- Note:
87+
- `read_buffer` indicates a preference for `ReadBuffer`.
88+
- `prefetch_buffer` indicates a preference for both `PrefetchBuffer` and `ReadBuffer`.
89+
- `read_stream` indicates a preference for `read_stream`.

src/vectorchord/usage/prefilter.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ Based on our experimental results, the QPS speedup at different selectivity is a
4545

4646
## Reference
4747

48+
This feature is not supported by `vchordg`.
49+
4850
### Search Parameters <badge type="info" text="vchordrq" />
4951

5052
#### `vchordrq.prefilter` <badge type="tip" text="since v0.4.0" />

src/vectorchord/usage/prewarm.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,14 @@ It works well even if the index size is much larger than memory size.
2828
- `regclass`, an object identifier of the `vchordrq` index
2929
- Example:
3030
- `SELECT vchordrq_prewarm('items_embedding_idx')`
31+
32+
### Functions <badge type="info" text="vchordg" />
33+
34+
#### `vchordg_prewarm` <badge type="tip" text="since v0.5.0" />
35+
36+
- Description: This function warms the `vchordg` index by loading index to buffer pool.
37+
- Result: `text`
38+
- Arguments:
39+
- `regclass`, an object identifier of the `vchordg` index
40+
- Example:
41+
- `SELECT vchordg_prewarm('items_embedding_idx')`

0 commit comments

Comments
 (0)