-
Notifications
You must be signed in to change notification settings - Fork 104
Closed
Labels
Description
What happened?
When a diskann index is created on a vector column, running a nearest-neighbor query that touches rows with NULL embeddings causes the PostgreSQL server process to crash with signal 11 (Segmentation fault). The issue does not occur if the index is dropped or if you force a sequential scan.
Expected behavior:
The query should return the two nearest neighbors for each row (skipping or ignoring any NULL embeddings), without crashing the server.
Actual behavior:
PostgreSQL terminates the server process with a segmentation fault:
Workarounds:
- Filtering out NULL embeddings in the application or in a CTE (WHERE embedding IS NOT NULL) prevents the crash.
- Dropping/disabling the diskann index and forcing a sequential scan also avoids the segfault (albeit at a performance cost).
pgvectorscale extension affected
0.7.1
PostgreSQL version used
17.5
What operating system did you use?
Ubuntu 24.04
What installation method did you use?
Docker
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
2025-06-17T23:05:46.960656325Z 2025-06-17 23:05:46.959 UTC [1] LOG: server process (PID 276) was terminated by signal 11: Segmentation fault
2025-06-17T23:05:46.960696355Z 2025-06-17 23:05:46.959 UTC [1] DETAIL: Failed process was running:
2025-06-17T23:05:46.960700631Z
2025-06-17T23:05:46.960704958Z -- 5. The query
2025-06-17T23:05:46.960708843Z select *
2025-06-17T23:05:46.960712409Z FROM articles_poc a
2025-06-17T23:05:46.960715953Z CROSS JOIN LATERAL (
2025-06-17T23:05:46.960719609Z SELECT b.id, b.embedding
2025-06-17T23:05:46.960723315Z FROM articles_poc b
2025-06-17T23:05:46.960727060Z ORDER BY b.embedding <=> a.embedding
2025-06-17T23:05:46.960731116Z LIMIT 1
2025-06-17T23:05:46.960734941Z ) AS nb
2025-06-17T23:05:46.960738638Z 2025-06-17 23:05:46.959 UTC [1] LOG: terminating any other active server processes
2025-06-17T23:05:46.975809166Z 2025-06-17 23:05:46.975 UTC [1] LOG: all server processes terminated; reinitializing
How can we reproduce the bug?
CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;
BEGIN;
-- force use the index
SET LOCAL enable_seqscan = off;
SET LOCAL enable_indexscan = on;
SET LOCAL enable_indexonlyscan = on;
-- 1. Enable vector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- 2. Create table
DROP TABLE IF EXISTS articles_poc;
CREATE TABLE articles_poc (
id serial PRIMARY KEY,
embedding vector(3)
);
-- 3. Insert rows
INSERT INTO articles_poc (embedding) VALUES
(ARRAY[0.1,0.2,0.3]::vector),
(NULL),
(ARRAY[0.2,0.1,0.4]::vector);
-- 4. Drop & recreate index
DROP INDEX IF EXISTS embedding_diskann_idx;
CREATE INDEX embedding_diskann_idx
ON articles_poc USING diskann (embedding);
-- 5. The query
select *
FROM articles_poc a
CROSS JOIN LATERAL (
SELECT b.id, b.embedding
FROM articles_poc b
ORDER BY b.embedding <=> a.embedding
LIMIT 1
) AS nb;
COMMIT;
Are you going to work on the bugfix?
🆘 No, could someone else please work on the bugfix?