Skip to content

Conversation

@wmTJc9IK0Q
Copy link

This allows a table UDF to be backed by an Arrow array (or table) by exposing a new MoveArrowToDataChunk method from the bindings layer (added in duckdb/duckdb-go-bindings#47) which will copy and consume an arrow record batch into an existing duckdb data chunk.

A test was added that demonstrates how to create a UDF like this.

Copilot AI review requested due to automatic review settings December 1, 2025 05:32
}

// Define a table UDF
type arrowTableUdf struct {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could make sense to expose a utility UDF struct for this, such as NewArrowTableUDF(duckdb.Arrow, arrow.Table). Then library users would only need to write the BindArguments function to construct the arrow.Table from their existing dataset.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be nice to have a separate registration method related to the Arrow and separate UDF type for the arrow data, Could you please wrap the existing chunked table UDF or create a new one?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for Arrow-backed table UDFs by exposing the MoveArrowToDataChunk method from the bindings layer. This enables users to implement table UDFs that efficiently transfer data from Arrow RecordBatches directly into DuckDB DataChunks.

Key Changes:

  • Exposed MoveArrowToDataChunk binding across all platform-specific arrow mapping files
  • Renamed DataChunkFromArrow to NewDataChunkFromArrow for consistency with bindings layer
  • Added MoveArrowToDataChunk method to the Arrow type with proper error handling
  • Included comprehensive test demonstrating Arrow-backed table UDF implementation

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
arrowmapping/arrow_mapping.go Updated to expose MoveArrowToDataChunk binding and renamed DataChunkFromArrow to NewDataChunkFromArrow
arrowmapping/arrow_mapping_darwin_amd64.go Platform-specific binding updates for macOS AMD64
arrowmapping/arrow_mapping_darwin_arm64.go Platform-specific binding updates for macOS ARM64
arrowmapping/arrow_mapping_linux_amd64.go Platform-specific binding updates for Linux AMD64
arrowmapping/arrow_mapping_linux_arm64.go Platform-specific binding updates for Linux ARM64
arrowmapping/arrow_mapping_windows_amd64.go Platform-specific binding updates for Windows AMD64
arrow.go Added MoveArrowToDataChunk method to Arrow type with error handling
arrow_test.go Added test demonstrating Arrow-backed table UDF with 10,000 row dataset

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@wmTJc9IK0Q
Copy link
Author

@VGSML would be good to get your feedback on this one.

@wmTJc9IK0Q
Copy link
Author

Looks like the arrowmapping changes have to be merged first for CI to be happy. Will wait for review on the overall idea first.

@VGSML
Copy link

VGSML commented Dec 1, 2025

@wmTJc9IK0Q it looks good for me, thank you so much!
@taniabogatsch could you take a look it?

@VGSML
Copy link

VGSML commented Dec 1, 2025

@wmTJc9IK0Q @taniabogatsch I have started the discussion for future arrow UDFs development, could you join?
#76

@taniabogatsch
Copy link
Collaborator

I have started the discussion for future arrow UDFs development, could you join?

What do you think - is it feasible and worthwhile to implement something similar in duckdb-go using the Arrow API?

I've seen the discussion but since I am not using the duckdb-go package myself I don't know how helpful my input would be. I.e., if I understand correctly, you want to discuss the usefulness of exposing such functionality?

@taniabogatsch
Copy link
Collaborator

taniabogatsch commented Dec 1, 2025

could you take a look it?

I've skimmed over this PR but I'll take a more in-depth look at it once we have resolved the PR in the duckdb-go-bindings. :)

@VGSML
Copy link

VGSML commented Dec 1, 2025

I have started the discussion for future arrow UDFs development, could you join?

What do you think - is it feasible and worthwhile to implement something similar in duckdb-go using the Arrow API?

I've seen the discussion but since I am not using the duckdb-go package myself I don't know how helpful my input would be. I.e., if I understand correctly, you want to discuss the usefulness of exposing such functionality?

yep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants