-
Notifications
You must be signed in to change notification settings - Fork 15
Add necessary support for arrow-backed table UDFs #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| } | ||
|
|
||
| // Define a table UDF | ||
| type arrowTableUdf struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could make sense to expose a utility UDF struct for this, such as NewArrowTableUDF(duckdb.Arrow, arrow.Table). Then library users would only need to write the BindArguments function to construct the arrow.Table from their existing dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be nice to have a separate registration method related to the Arrow and separate UDF type for the arrow data, Could you please wrap the existing chunked table UDF or create a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for Arrow-backed table UDFs by exposing the MoveArrowToDataChunk method from the bindings layer. This enables users to implement table UDFs that efficiently transfer data from Arrow RecordBatches directly into DuckDB DataChunks.
Key Changes:
- Exposed
MoveArrowToDataChunkbinding across all platform-specific arrow mapping files - Renamed
DataChunkFromArrowtoNewDataChunkFromArrowfor consistency with bindings layer - Added
MoveArrowToDataChunkmethod to theArrowtype with proper error handling - Included comprehensive test demonstrating Arrow-backed table UDF implementation
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| arrowmapping/arrow_mapping.go | Updated to expose MoveArrowToDataChunk binding and renamed DataChunkFromArrow to NewDataChunkFromArrow |
| arrowmapping/arrow_mapping_darwin_amd64.go | Platform-specific binding updates for macOS AMD64 |
| arrowmapping/arrow_mapping_darwin_arm64.go | Platform-specific binding updates for macOS ARM64 |
| arrowmapping/arrow_mapping_linux_amd64.go | Platform-specific binding updates for Linux AMD64 |
| arrowmapping/arrow_mapping_linux_arm64.go | Platform-specific binding updates for Linux ARM64 |
| arrowmapping/arrow_mapping_windows_amd64.go | Platform-specific binding updates for Windows AMD64 |
| arrow.go | Added MoveArrowToDataChunk method to Arrow type with error handling |
| arrow_test.go | Added test demonstrating Arrow-backed table UDF with 10,000 row dataset |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@VGSML would be good to get your feedback on this one. |
|
Looks like the arrowmapping changes have to be merged first for CI to be happy. Will wait for review on the overall idea first. |
|
@wmTJc9IK0Q it looks good for me, thank you so much! |
|
@wmTJc9IK0Q @taniabogatsch I have started the discussion for future arrow UDFs development, could you join? |
I've seen the discussion but since I am not using the duckdb-go package myself I don't know how helpful my input would be. I.e., if I understand correctly, you want to discuss the usefulness of exposing such functionality? |
I've skimmed over this PR but I'll take a more in-depth look at it once we have resolved the PR in the duckdb-go-bindings. :) |
yep. |
This allows a table UDF to be backed by an Arrow array (or table) by exposing a new
MoveArrowToDataChunkmethod from the bindings layer (added in duckdb/duckdb-go-bindings#47) which will copy and consume an arrow record batch into an existing duckdb data chunk.A test was added that demonstrates how to create a UDF like this.