Skip to content

[Bug]: Segmentation fault when querying with diskann index on vector column containing NULLs #238

@samuelscheit

Description

@samuelscheit

What happened?

When a diskann index is created on a vector column, running a nearest-neighbor query that touches rows with NULL embeddings causes the PostgreSQL server process to crash with signal 11 (Segmentation fault). The issue does not occur if the index is dropped or if you force a sequential scan.

Expected behavior:

The query should return the two nearest neighbors for each row (skipping or ignoring any NULL embeddings), without crashing the server.

Actual behavior:

PostgreSQL terminates the server process with a segmentation fault:

Workarounds:

  • Filtering out NULL embeddings in the application or in a CTE (WHERE embedding IS NOT NULL) prevents the crash.
  • Dropping/disabling the diskann index and forcing a sequential scan also avoids the segfault (albeit at a performance cost).

pgvectorscale extension affected

0.7.1

PostgreSQL version used

17.5

What operating system did you use?

Ubuntu 24.04

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

2025-06-17T23:05:46.960656325Z 2025-06-17 23:05:46.959 UTC [1] LOG:  server process (PID 276) was terminated by signal 11: Segmentation fault
2025-06-17T23:05:46.960696355Z 2025-06-17 23:05:46.959 UTC [1] DETAIL:  Failed process was running: 
2025-06-17T23:05:46.960700631Z 	
2025-06-17T23:05:46.960704958Z 	  -- 5. The query
2025-06-17T23:05:46.960708843Z 	  select *
2025-06-17T23:05:46.960712409Z 	  FROM articles_poc a
2025-06-17T23:05:46.960715953Z 	  CROSS JOIN LATERAL (
2025-06-17T23:05:46.960719609Z 	    SELECT b.id, b.embedding
2025-06-17T23:05:46.960723315Z 	    FROM articles_poc b
2025-06-17T23:05:46.960727060Z 	    ORDER BY b.embedding <=> a.embedding
2025-06-17T23:05:46.960731116Z 	    LIMIT 1
2025-06-17T23:05:46.960734941Z 	  ) AS nb
2025-06-17T23:05:46.960738638Z 2025-06-17 23:05:46.959 UTC [1] LOG:  terminating any other active server processes
2025-06-17T23:05:46.975809166Z 2025-06-17 23:05:46.975 UTC [1] LOG:  all server processes terminated; reinitializing

How can we reproduce the bug?

CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;

BEGIN;

  -- force use the index
  SET LOCAL enable_seqscan       = off;
  SET LOCAL enable_indexscan     = on;
  SET LOCAL enable_indexonlyscan = on;

  -- 1. Enable vector extension
  CREATE EXTENSION IF NOT EXISTS vector;

  -- 2. Create table
  DROP TABLE IF EXISTS articles_poc;
  CREATE TABLE articles_poc (
    id serial PRIMARY KEY,
    embedding vector(3)
  );

  -- 3. Insert rows
  INSERT INTO articles_poc (embedding) VALUES
    (ARRAY[0.1,0.2,0.3]::vector),
    (NULL),
    (ARRAY[0.2,0.1,0.4]::vector);

  -- 4. Drop & recreate index
  DROP INDEX IF EXISTS embedding_diskann_idx;
  CREATE INDEX embedding_diskann_idx
    ON articles_poc USING diskann (embedding);

  -- 5. The query
  select *
  FROM articles_poc a
  CROSS JOIN LATERAL (
    SELECT b.id, b.embedding
    FROM articles_poc b
    ORDER BY b.embedding <=> a.embedding
    LIMIT 1
  ) AS nb;

COMMIT;

Are you going to work on the bugfix?

🆘 No, could someone else please work on the bugfix?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions