Skip to content

Conversation

cpegeric
Copy link
Contributor

@cpegeric cpegeric commented Aug 18, 2025

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #22368

What this PR does / why we need it:

index support with the same order by distance function.


PR Type

Bug fix


Description

  • Enable distance function optimization for vector projections

  • Fix commented-out code in HNSW index optimization

  • Add IVFFlat index support for duplicate distance functions

  • Add test cases for projection with distance functions


Diagram Walkthrough

flowchart LR
  A["Query with distance function"] --> B["Find duplicate distance functions"]
  B --> C["Replace with column reference"]
  C --> D["Optimized execution plan"]
Loading

File Walkthrough

Relevant files
Bug fix
apply_indices_hnsw.go
Enable HNSW distance function optimization                             

pkg/sql/plan/apply_indices_hnsw.go

  • Uncommented code for finding equal distance functions in projections
  • Added logic to replace duplicate distance functions with column
    references
  • Removed stray comment markers and cleaned up code structure
+14/-16 
Enhancement
apply_indices_ivfflat.go
Add IVFFlat distance function optimization                             

pkg/sql/plan/apply_indices_ivfflat.go

  • Added identical optimization logic for IVFFlat indices
  • Implemented duplicate distance function detection and replacement
  • Ensures consistent behavior between HNSW and IVFFlat indices
+16/-0   
Tests
vector_hnsw.result
Add HNSW projection test results                                                 

test/distributed/cases/vector/vector_hnsw.result

  • Added test result for query with distance function in projection
  • Shows expected output with orderbyfn column containing distance values
+4/-0     
vector_hnsw.sql
Add HNSW projection test case                                                       

test/distributed/cases/vector/vector_hnsw.sql

  • Added test case for selecting distance function in projection with
    ORDER BY
  • Tests optimization of duplicate distance function calculations
+1/-0     
vector_index.result
Add IVFFlat projection test results                                           

test/distributed/cases/vector/vector_index.result

  • Added test result for IVFFlat index with distance function in
    projection
  • Validates correct distance values in orderbyfn column
+4/-0     
vector_index.sql
Add IVFFlat projection test case                                                 

test/distributed/cases/vector/vector_index.sql

  • Added test case for IVFFlat index with distance function projection
  • Ensures consistent testing across different vector index types
+1/-0     

@mergify mergify bot added the kind/bug Something isn't working label Aug 18, 2025
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

6168 - Partially compliant

Compliant requirements:

Non-compliant requirements:

  • Fix panic when inserting empty string "" into TIME type.
  • Correct filtering behavior when comparing TIME with string like "23".
  • Allow inserting a numeric-like string "101412" into TIME primary key as appropriate, or handle with proper error.
  • Fix errors in TIME type expressions (arithmetic with TIME and integers).

Requires further human verification:

  • Validate at runtime that TIME-related inserts, filters, and expressions behave as per MySQL compatibility expectations in the referenced ticket, since this PR code changes vector index planning unrelated to TIME type.

6168 - Partially compliant

Compliant requirements:

Non-compliant requirements:

  • Fix panic when inserting empty string "" into TIME type.
  • Correct filtering behavior when comparing TIME with string like "23".
  • Allow inserting a numeric-like string "101412" into TIME primary key as appropriate, or handle with proper error.
  • Fix errors in TIME type expressions (arithmetic with TIME and integers).

Requires further human verification:

  • Validate at runtime that TIME-related inserts, filters, and expressions behave as per MySQL compatibility expectations in the referenced ticket, since this PR code changes vector index planning unrelated to TIME type.
⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The reintroduced logic replaces projection expressions with a ColRef assuming the score is always at column position 1 and uses the first binding tag. Verify that curr_node.TableDef.Cols[1], BindingTags[0], and ColPos: 1 are valid in all query shapes and cannot cause out-of-bounds or wrong-column references.

// replace the project with ColRef (same distFn as the order by)
for _, id := range projids {
	projNode.ProjectList[id] = &Expr{
		Typ: curr_node.TableDef.Cols[1].Typ,
		Expr: &plan.Expr_Col{
			Col: &plan.ColRef{
				RelPos: curr_node.BindingTags[0],
				ColPos: 1, // score
			},
		},
	}
}
Dead Code Artifact

There is a stray closing comment marker adjusted nearby; ensure no unmatched comment delimiters remain and the nearby helper findPkFromProject block comment state is intentional.

func (builder *QueryBuilder) findPkFromProject(projNode *plan.Node, pkPos int32) []int32 {

	projids := make([]int32, 0)
	for i, expr := range projNode.ProjectList {
		if expr.GetCol() != nil {
			if expr.GetCol().ColPos == pkPos {
				projids = append(projids, int32(i))
			}
		}
	}
	return projids
}
*/

// e.g. SELECT a, L2_DISTANCE(b, '[0, ..]') FROM SRC ORDER BY L2_DISTANCE(b, '[0,..]') limit 4
Assumption Consistency

The same ColRef replacement logic is duplicated for IVFFlat. Confirm that IVFFlat plans share the same column ordering (score at position 1) and bindings as HNSW to avoid incorrect plan references.

// replace the project with ColRef (same distFn as the order by)
for _, id := range projids {
	projNode.ProjectList[id] = &Expr{
		Typ: curr_node.TableDef.Cols[1].Typ,
		Expr: &plan.Expr_Col{
			Col: &plan.ColRef{
				RelPos: curr_node.BindingTags[0],
				ColPos: 1, // score
			},
		},
	}
}

Copy link

qodo-merge-pro bot commented Aug 18, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Misaligned ticket and PR scope

The PR addresses vector index distance-function projection optimization, but the
linked ticket is about TIME type bugs; this misalignment risks bypassing triage
and urgency for a severity S1 issue. Either retarget this PR to the correct
issue or update the ticket and justification to ensure the fix matches the
reported problem and priority.

Examples:

Solution Walkthrough:

Before:

// In PR metadata
PR Title: "fix: enable distance function optimization"
PR Description:
  ...
  Fixes: #6168
  ...

// In Ticket #6168
Ticket Title: "[Bug]: time type some bugs"
Severity: S1

After:

// In PR metadata
PR Title: "feat: enable distance function optimization"
PR Description:
  ...
  Fixes: #NEW_TICKET_FOR_VECTOR_OPTIMIZATION
  ...

// Ticket #6168 remains open and unlinked from this PR
Ticket Title: "[Bug]: time type some bugs"
Severity: S1

Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical disconnect between the PR's implementation (vector optimization) and the linked S1 ticket (TIME type bugs), which is a major process and tracking issue.

High
Possible issue
Add safety guards to projection rewrite

Guard the replacement with a bounds check to ensure curr_node.TableDef.Cols has
at least two columns and that BindingTags is non-empty. Also ensure
projNode.ProjectList[id] is a function matching the detected distance function
before overwriting, to avoid corrupting unrelated projections.

pkg/sql/plan/apply_indices_hnsw.go [257-271]

 // check equal distFn and only compute once for equal function()
 projids := builder.findEqualDistFnFromProject(projNode, distFnExpr)
 
 // replace the project with ColRef (same distFn as the order by)
-for _, id := range projids {
-	projNode.ProjectList[id] = &Expr{
-		Typ: curr_node.TableDef.Cols[1].Typ,
-		Expr: &plan.Expr_Col{
-			Col: &plan.ColRef{
-				RelPos: curr_node.BindingTags[0],
-				ColPos: 1, // score
-			},
-		},
+if curr_node != nil && len(curr_node.TableDef.Cols) > 1 && len(curr_node.BindingTags) > 0 {
+	for _, id := range projids {
+		if id >= 0 && int(id) < len(projNode.ProjectList) {
+			if projNode.ProjectList[id].GetF() != nil {
+				projNode.ProjectList[id] = &Expr{
+					Typ: curr_node.TableDef.Cols[1].Typ,
+					Expr: &plan.Expr_Col{
+						Col: &plan.ColRef{
+							RelPos: curr_node.BindingTags[0],
+							ColPos: 1, // score
+						},
+					},
+				}
+			}
+		}
 	}
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies potential panic risks from out-of-bounds slice access on curr_node.TableDef.Cols and curr_node.BindingTags, and adds necessary nil and bounds checks to prevent runtime errors, significantly improving code robustness.

Medium
Add guards to IVFFlat projection rewrite

Mirror the HNSW guards here: verify curr_node has at least two columns and
binding tags, check id bounds, and ensure the target projection is actually the
detected distance function before overwriting. This prevents panics and
unintended rewrites.

pkg/sql/plan/apply_indices_ivfflat.go [285-299]

 // check equal distFn and only compute once for equal function()
 projids := builder.findEqualDistFnFromProject(projNode, distFnExpr)
 
 // replace the project with ColRef (same distFn as the order by)
-for _, id := range projids {
-	projNode.ProjectList[id] = &Expr{
-		Typ: curr_node.TableDef.Cols[1].Typ,
-		Expr: &plan.Expr_Col{
-			Col: &plan.ColRef{
-				RelPos: curr_node.BindingTags[0],
-				ColPos: 1, // score
-			},
-		},
+if curr_node != nil && len(curr_node.TableDef.Cols) > 1 && len(curr_node.BindingTags) > 0 {
+	for _, id := range projids {
+		if id >= 0 && int(id) < len(projNode.ProjectList) {
+			if projNode.ProjectList[id].GetF() != nil {
+				projNode.ProjectList[id] = &Expr{
+					Typ: curr_node.TableDef.Cols[1].Typ,
+					Expr: &plan.Expr_Col{
+						Col: &plan.ColRef{
+							RelPos: curr_node.BindingTags[0],
+							ColPos: 1, // score
+						},
+					},
+				}
+			}
+		}
 	}
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies potential panic risks from out-of-bounds slice access on curr_node.TableDef.Cols and curr_node.BindingTags, and adds necessary nil and bounds checks to prevent runtime errors, significantly improving code robustness.

Medium
  • Update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working Review effort 2/5 size/S Denotes a PR that changes [10,99] lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants