-
-
Notifications
You must be signed in to change notification settings - Fork 231
Add support for VECTOR type #3162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
33bc757
to
0bae5a0
Compare
90102d4
to
d955334
Compare
func floatsToString(fs ...float32) string { | ||
result := make([]byte, 4*len(fs)) | ||
binary.Encode(result, binary.LittleEndian, fs) | ||
return string(result) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this function is correct, and thus I don't have confidence that the test is correct. What this produces isn't human readable. I would expect something like the following:
b := make([]byte, 4*len(fs))
for i, f := range fs {
binary.LittleEndian.PutUint32(b[i*4:], math.Float32bits(f))
}
return hex.EncodeToString(b)
But it's not clear what the output is supposed to look like - does one of the tests show the ascii string? I would expect there to be "\x02\x03\xa5" like strings.
id INT PRIMARY KEY, | ||
small_vec VECTOR(2), | ||
medium_vec VECTOR(10), | ||
large_vec VECTOR(1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you aren't creating a row large_vec
in your test. Intended? Is this just to verify that it shows when show create table
is run?
case float64: | ||
res[i] = float32(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can result in +/-Inf. Is that OK? Do we need to error in that case?
sql/expression/function/length.go
Outdated
@@ -123,6 +123,10 @@ func (l *Length) Eval(ctx *sql.Context, row sql.Row) (interface{}, error) { | |||
return int32(wrapper.MaxByteLength()), nil | |||
} | |||
|
|||
// Getting the length of a vector doesn't require converting it to a string, it just returns the length in bytes | |||
if vec, ok := val.([]float32); ok { | |||
return int32(4 * len(vec)), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. there are places where we define constants for data width of different low level integer types (eg uint32Size = 4
). I don't actually see anywhere we've done this for floats. I've seen that 4
several times now. Up to you.
sql/types/vector_test.go
Outdated
name: "equal_vectors", | ||
a: []float32{1.0, 2.0, 3.0}, | ||
b: []float32{1.0, 2.0, 3.0}, | ||
expected: 0, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not seeing any negative test cases for these. Am I missing something?
This PR adds a VECTOR type to GMS. Vectors are arrays of 32-bit floats.
It also adds several functions that take vectors as arguments, including converting vectors to and from strings, and functions for computing distances between vectors.
Finally, it ensures that vector types work correctly when passed to existing functions (such as BIT_LENGTH, MD5, etc.)