Skip to content

0.5.0 #136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 15, 2025
Merged

0.5.0 #136

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -191,9 +191,11 @@ export default defineConfig({
items: [
{ text: 'Indexing', link: '/vectorchord/usage/indexing' },
{ text: 'Multi-Vector Retrieval', link: '/vectorchord/usage/indexing-with-maxsim-operators' },
{ text: 'Graph Index', link: '/vectorchord/usage/graph-index' },
{ text: 'Similarity Filter', link: '/vectorchord/usage/range-query' },
{ text: 'PostgreSQL Tuning', link: '/vectorchord/usage/performance-tuning' },
{ text: 'Monitoring', link: '/vectorchord/usage/monitoring' },
{ text: 'Measure Recall', link: '/vectorchord/usage/measure-recall' },
{ text: 'Prewarm', link: '/vectorchord/usage/prewarm' },
{ text: 'Prefilter', link: '/vectorchord/usage/prefilter' },
{ text: 'Prefetch', link: '/vectorchord/usage/prefetch' },
Expand Down
2 changes: 1 addition & 1 deletion src/vectorchord/admin/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ ttensorchord=> \dx
Name | Version | Schema | Description
---------+---------+------------+---------------------------------------------------------------------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
vchord | 0.4.3 | public | vchord: Vector database plugin for Postgres, written in Rust, specifically designed for LLM
vchord | 0.5.0 | public | vchord: Vector database plugin for Postgres, written in Rust, specifically designed for LLM
vector | 0.8.0 | public | vector data type and ivfflat and hnsw access methods
(3 rows)
```
Expand Down
2 changes: 1 addition & 1 deletion src/vectorchord/admin/migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ postgres=# \dx
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
vchord | 0.1.0 | public | vchord: Vector database plugin for Postgres, written in Rust, specifically designed for LLM
vector | 0.8.0 | public | vector data type and ivfflat and hnsw access methods
vectors | 0.4.3 | vectors | vectors: Vector database plugin for Postgres, written in Rust, specifically designed for LLM
vectors | 0.5.0 | vectors | vectors: Vector database plugin for Postgres, written in Rust, specifically designed for LLM
```

:::
Expand Down
76 changes: 34 additions & 42 deletions src/vectorchord/getting-started/installation.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
# Installation

There are four ways to install VectorChord.
VectorChord is tested on the following operating system:

* Ubuntu (x86_64, aarch64)
* MacOS (aarch64)
* Windows (x86_64)
* Alpine Linux (x86_64, aarch64) [^1]

[^1]: VectorChord is tested with PostgreSQL 15 in `community` repository, 16 and 17 in `main` repository on Alpine Linux 3.22.

Please report a bug if you encounter issues on any of the above operating systems, or submit a feature request for additional platform support.

There are 4 ways to install VectorChord.

## Docker

Expand All @@ -17,7 +28,7 @@ docker run \
--name vchord-demo \
-e POSTGRES_PASSWORD=mysecretpassword \
-p 5432:5432 \
-d tensorchord/vchord-postgres:pg17-v0.4.3
-d tensorchord/vchord-postgres:pg17-v0.5.0
```

2. Connect to the database using the `psql` command line tool. The default username is `postgres`.
Expand Down Expand Up @@ -62,22 +73,13 @@ Other sections may align with the above.

## Debian packages

::: tip

Installation from Debian packages requires a dependency on `GLIBC >= 2.35`, so only the following distributions are supported:

- `Debian 12 (Bookworm)` or later
- `Ubuntu 22.04` or later

:::

Debian packages are used for Debian-based Linux distributions, including Debian and Ubuntu. They can be easily installed by `apt`. You can use this installation method on x86_64 Linux and aarch64 Linux.

1. Download Debian packages in [the release page](https://github.com/tensorchord/VectorChord/releases/latest), and install them by `apt`.

```sh
wget https://github.com/tensorchord/VectorChord/releases/download/0.4.3/postgresql-17-vchord_0.4.3-1_$(dpkg --print-architecture).deb
sudo apt install ./postgresql-17-vchord_0.4.3-1_$(dpkg --print-architecture).deb
wget https://github.com/tensorchord/VectorChord/releases/download/0.5.0/postgresql-17-vchord_0.5.0-1_$(dpkg --print-architecture).deb
sudo apt install ./postgresql-17-vchord_0.5.0-1_$(dpkg --print-architecture).deb
```

2. Configure your PostgreSQL by modifying the `shared_preload_libraries` to include the extension. And then restart the PostgreSQL cluster.
Expand Down Expand Up @@ -130,6 +132,12 @@ CREATE EXTENSION IF NOT EXISTS vchord CASCADE;

## PGXN

::: tip

See [Source](#source) for build requirements.

:::

1. Install VectorChord from [PostgreSQL Extension Network](https://pgxn.org/dist/vchord) with:

```sh
Expand Down Expand Up @@ -158,53 +166,37 @@ There is a broken VectorChord `0.4.1` package on PGXN. Please do not use it. Use

::: tip

VectorChord supports UNIX-like operating systems and Windows. Please report an issue if you cannot compile or make it work.
Build requirements:

VectorChord supports little-endian architectures but only provides performance advantages on x86_64 and aarch64.
* any port of `make`
* `clang >= 16` with `libclang`
* `rust >= 1.89` with `cargo`

:::

You may need to install VectorChord from source. Please follow these steps.

1. Clone the repository and checkout the branch.

```sh
git clone https://github.com/tensorchord/VectorChord.git
cd VectorChord
git checkout "0.4.3"
```
It's recommended to use Rustup for installing Rust on most platforms, while on Alpine Linux, using the system package manager is advised.

2. Install a C compiler and Rust. For Clang, the version must be 16 or higher. For GCC, the version must be 14 or higher. Other C compilers are not supported, and we prefer and recommend using Clang. For Rust, the version must be the same as that recorded in `rust-toolchain.toml`.
You can set the environment variable `CC` to specify the desired C compiler for the build system. If you do not set this variable, the build system automatically searches for clang and gcc. To compile all C code with clang, set `CC` to the path of clang. To compile all C code with gcc, set `CC` to the path of gcc; note that in this case, there is a requirement `gcc >= 14`.

You could download Clang from https://github.com/llvm/llvm-project/releases.
Rust version requirement is not a long-term guarantee; we will raise the required Rust version with each new release.

You could setup Rust with Rustup. See https://rustup.rs/.
:::

3. Build it and install it.
1. Download the source code, build and install it with `make`.

```sh
curl -fsSL https://github.com/tensorchord/VectorChord/archive/refs/tags/0.5.0.tar.gz | tar -xz
cd VectorChord-0.5.0
make build
make install # or `sudo make install`
```

4. Configure your PostgreSQL by modifying the `shared_preload_libraries` to include the extension. And then restart the PostgreSQL cluster.
2. Configure your PostgreSQL by modifying the `shared_preload_libraries` to include the extension. And then restart the PostgreSQL cluster.

```sh
psql -U postgres -c 'ALTER SYSTEM SET shared_preload_libraries = "vchord"'
```

5. Run the following SQL to ensure the extension is enabled.
3. Run the following SQL to ensure the extension is enabled.

```sql
CREATE EXTENSION IF NOT EXISTS vchord CASCADE;
```

::: tip

By default, `VectorChord` only finds `clang` in `PATH` as the C compiler.

If your Clang executable is not named `clang` or is not in `PATH`, please set the environment variable `CC` to path of your Clang.

If you prefer to use GCC, please set the environment variable `CC` to `gcc` or path of your GCC.

:::
4 changes: 3 additions & 1 deletion src/vectorchord/getting-started/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ docker run \
--name vectorchord-demo \
-e POSTGRES_PASSWORD=mysecretpassword \
-p 5432:5432 \
-d tensorchord/vchord-postgres:pg17-v0.4.3
-d tensorchord/vchord-postgres:pg17-v0.5.0
```
> In addition to the base image with the VectorChord extension, we provide an all-in-one image, `tensorchord/vchord-suite:pg17-latest`. This comprehensive image includes all official TensorChord extensions. Developers should select an image tag that is compatible with their extension's version, as indicated in [the support matrix](https://github.com/tensorchord/VectorChord-images?tab=readme-ov-file#support-matrix).

Expand Down Expand Up @@ -78,9 +78,11 @@ For more usage, please read:

- [Indexing](/vectorchord/usage/indexing)
- [Multi-Vector Retrieval](/vectorchord/usage/indexing-with-maxsim-operators)
- [Graph Index](/vectorchord/usage/graph-index)
- [Similarity Filter](/vectorchord/usage/range-query)
- [PostgreSQL Tuning](/vectorchord/usage/performance-tuning)
- [Monitoring](/vectorchord/usage/monitoring)
- [Measure Recall](/vectorchord/usage/measure-recall)
- [Prewarm](/vectorchord/usage/prewarm)
- [Prefilter](/vectorchord/usage/prefilter)
- [Prefetch](/vectorchord/usage/prefetch)
Expand Down
2 changes: 2 additions & 0 deletions src/vectorchord/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,11 @@

- [Indexing](/vectorchord/usage/indexing)
- [Multi-Vector Retrieval](/vectorchord/usage/indexing-with-maxsim-operators)
- [Graph Index](/vectorchord/usage/graph-index)
- [Similarity Filter](/vectorchord/usage/range-query)
- [PostgreSQL Tuning](/vectorchord/usage/performance-tuning)
- [Monitoring](/vectorchord/usage/monitoring)
- [Measure Recall](/vectorchord/usage/measure-recall)
- [Prewarm](/vectorchord/usage/prewarm)
- [Prefilter](/vectorchord/usage/prefilter)
- [Prefetch](/vectorchord/usage/prefetch)
Expand Down
2 changes: 2 additions & 0 deletions src/vectorchord/usage/external-build.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,5 @@ $$);
```

To simplify the workflow, we provide end-to-end scripts for external index pre-computation, refer to [Run External Index Precomputation Toolkit](https://github.com/tensorchord/VectorChord/tree/main/scripts#run-external-index-precomputation-toolkit).

This feature is not supported by `vchordg`, since this step does not exist in it.
134 changes: 134 additions & 0 deletions src/vectorchord/usage/graph-index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Graph Index <badge type="tip" text="since v0.5.0" />

VectorChord's index type `vchordg` is a disk-based graph index. It has low memory consumption.

Let's start by creating a table named `items` with an `embedding` column of type `vector(n)`, and then populate it with sample data.

To create a `vchordg` index, you can use the following SQL.

```sql
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops);
```

::: tip
This feature is in preview.
:::

## Tuning

When building an index, two options usually need tuning: `m` and `ef_construction`. `m` is the maximum number of neighbors per vertex, and `ef_construction` is the size of the dynamic list containing the nearest neighbors during insertion.

In search, you need to tune `ef_search`. `ef_search` is the size of the dynamic list containing the nearest neighbors during search.

```sql
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops) WITH (options = $$
m = 64
ef_construction = 128
$$);

SET vchordg.ef_search TO '128';
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 10;
```

As a disk-based index, `vchordg` usually requires only the quantized vectors to be kept in the buffer pool to maintain performance. By default, `vchordg` quantizes a $D$-dimensional vector to $2D$ bits. Let the number of rows be $N$. Then, the total memory required for the index is $2DN$ bits. If you have very limited memory and are using ultra-high dimensional vectors, consider quantizing a vector to $D$ bits. Then, the total memory required for the index is $DN$ bits.

```sql
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops) WITH (options = $$
bits = 1
m = 64
ef_construction = 128
$$);
```

The index building can be sped up by using multiple processes. Refer to [PostgreSQL Tuning](performance-tuning.md#indexing).

## Reference

### Operator Classes <badge type="info" text="vchordg" /> {#operator-classes}

The following table lists all available operator classes supported by `vchordg`.

| Operator Class | Description | Operator 1 | Operator 2 |
| -------------------- | --------------------------------------------------------- | ---------------------- | ------------------------ |
| `vector_l2_ops` | index works for `vector` type and Euclidean distance | `<->(vector,vector)` | `<<->>(vector,vector)` |
| `vector_ip_ops` | index works for `vector` type and negative inner product | `<#>(vector,vector)` | `<<#>>(vector,vector)` |
| `vector_cosine_ops` | index works for `vector` type and cosine distance | `<=>(vector,vector)` | `<<=>>(vector,vector)` |
| `halfvec_l2_ops` | index works for `halfvec` type and Euclidean distance | `<->(halfvec,halfvec)` | `<<->>(halfvec,halfvec)` |
| `halfvec_ip_ops` | index works for `halfvec` type and negative inner product | `<#>(halfvec,halfvec)` | `<<#>>(halfvec,halfvec)` |
| `halfvec_cosine_ops` | index works for `halfvec` type and cosine distance | `<=>(halfvec,halfvec)` | `<<=>>(halfvec,halfvec)` |

`<<->>`, `<<#>>`, `<<=>>` are operators defined by VectorChord.

For more information about `<<->>`, `<<#>>`, `<<=>>`, refer to [Similarity Filter](range-query).

All operator classes are available since version `0.3.0`.

### Indexing Options <badge type="info" text="vchordg" />

#### `bits` <badge type="tip" text="since v0.5.0" />

- Description: The ratio of bits to dimensions after RaBitQ quantization. `bits = 2` provides better recall and QPS, while `bits = 1` consumes less memory.
- Type: integer
- Default: `2`
- Example:
- `bits = 2` means a $D$-dimensional vector is quantized to $2D$ bits.
- `bits = 1` means a $D$-dimensional vector is quantized to $D$ bits .

#### `m` <badge type="tip" text="since v0.5.0" />

- Description: The maximum number of neighbors per vertex. The larger `m` is, the better the performance, but the higher the storage requirement. `m` corresponds to $m_0$ in HNSW and $m$ in DiskANN.
- Type: integer
- Default: `32`
- Example:
- `m = 32` means that there are at most $32$ neighbors per vertex.
- `m = 64` means that there are at most $64$ neighbors per vertex.

#### `ef_construction` <badge type="tip" text="since v0.5.0" />

- Description: The size of the dynamic list containing the nearest neighbors during insertion. The larger `ef_construction` is, the better the performance, but the slower the insertion. `ef_construction` corresponds to $\text{ef}_\text{construction}$ in HNSW and $\text{ef}_C$ in DiskANN.
- Type: integer
- Default: `64`
- Example:
- `ef_construction = 64` means that the size of the dynamic list containing the nearest neighbors is $64$ during insertion.
- `ef_construction = 128` means that the size of the dynamic list containing the nearest neighbors is $128$ during insertion.

#### `alpha` <badge type="tip" text="since v0.5.0" />

- Description: The `alpha` values selected during pruning. `alpha` corresponds to $\alpha$ in DiskANN. This option must be an ascending list, where the first element is `1.0` and the last element is less than `2.0`.
- Type: list of floats
- Default: `[1.0, 1.2]`
- Example:
- `alpha = [1.0, 1.2]` is equivalent to setting `alpha = 1.2` in DiskANN.
- `alpha = [1.0]` is equivalent to the default pruning strategy in HNSW.
- Note: this option is ineffective when the distance metric is negative inner product.

#### `beam_construction` <badge type="tip" text="since v0.5.0" />

- Description: Beam width used during insertion. Beam width refers to the number of vertices accessed at once. Since the time to randomly read a small number of sectors from the SSD is almost the same as reading a single sector, a larger beam width effectively reduces the number of round trips to the SSD, resulting in better performance. Since it increases computation, it is disadvantageous for indexes that fit entirely in memory.
- Type: integer
- Default: `1`
- Example:
- `beam_construction = 8` means that the index accesses 8 vertices at once during insertion.
- `beam_construction = 1` means that the index accesses 1 vertex at once during insertion.

### Search Parameters <badge type="info" text="vchordg" />

#### `vchordg.ef_search` <badge type="tip" text="since v0.5.0" />

- Description: The size of the dynamic list containing the nearest neighbors in search. The larger `vchordg.ef_search` is, the better the recall, but the worse the QPS. `vchordg.ef_search` corresponds to $\text{ef}$ in HNSW and DiskANN.
- Type: integer
- Default: `64`
- Domain: `[1, 65535]`
- Example:
- `SET vchordg.ef_search = 64` indicates the size of the dynamic list containing the nearest neighbors is $64$ during search.
- `SET vchordg.ef_search = 128` indicates the size of the dynamic list containing the nearest neighbors is $128$ during search.

#### `vchordg.beam_search` <badge type="tip" text="since v0.5.0" />

- Description: Beam widths used during search. Beam width refers to the number of vertices accessed at once. Since the time to randomly read a small number of sectors from the SSD is almost the same as reading a single sector, a larger beam width effectively reduces the number of round trips to the SSD, resulting in better performance. Since it increases computation, it is disadvantageous for indexes that fit entirely in memory.
- Type: integer
- Default: `1`
- Domain: `[1, 65535]`
- Example:
- `SET vchordg.beam_search = 8` indicates that the index accesses 8 vertices at once during search.
- `SET vchordg.beam_search = 1` indicates that the index accesses 1 vertex at once during search.
4 changes: 2 additions & 2 deletions src/vectorchord/usage/indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

VectorChord's index type `vchordrq` divides vectors into lists and searches only a subset of lists closest to the query vector. It provides fast build time and low memory consumption, while delivering [significantly better performance](https://blog.vectorchord.ai/vectorchord-store-400k-vectors-for-1-in-postgresql#heading-ivf-vs-hnsw) than both `hnsw` and `ivfflat`.

To build a vector index, start by creating a table named `items` with an `embedding` column of type `vector(n)`, then populate it with sample data.
Let's start by creating a table named `items` with an `embedding` column of type `vector(n)`, and then populate it with sample data.

```sql
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
Expand Down Expand Up @@ -211,7 +211,7 @@ The operator classes for `MaxSim` are available since version `0.3.0`.
- `build.internal.build_threads = 1` means that the K-means algorithm uses $1$ thread.
- `build.internal.build_threads = 4` means that the K-means algorithm uses $4$ threads.

### Search Parameters <badge type="info" text="vchordrq" />
### Search Parameters <badge type="info" text="vchordrq" /> {#search-parameters}

#### `vchordrq.probes`

Expand Down
Loading