-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-16823: [C++] Arrow Substrait enhancements for UDF #13375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@rtpsw I did skim through the PR, interesting!.
|
You're right, this isn't trivial. The issue is that I intend to add test cases a bit later. This PR is an extraction from a larger project project I'm working on for end-to-end (Ibis/Ibis-Substrait/PyArrow) support for Python-UDFs. |
|
@vibhatha, I think this PR is ready for review. Are you the one to review it? |
@rtpsw I was reading it now. But I won't be a major reviewer. I will be closely reading and co-review certain parts. cc @westonpace @lidavidm could you please take a look. |
|
Right now I see only a minor change waiting for me to make. Let me know if you're still reviewing and I'll hold for your notification. |
|
@vibhatha, I'm not up to date on Acero/Substrait progress anymore. Are the changes here reasonable? |
I have some explanation here, in case it helps. The TBD parts are expected in an upcoming PR (or two) I'll prepare. |
|
For background on nested registries, see:
|
@lidavidm I am going to go through again. I will check with my knowlege on ACERO/substrait. But it would be better to have another review from @westonpace on this. |
|
cc @westonpace could please take a look. |
@rtpsw I added a comment to the JIRA. Appreciate your feedback to clarify the design and usage. |
|
I think we have two important pieces discussed here. One is how Substrait-UDF usage is benefitted and the second is how the function registry usage must be modified. Since the function registry usage is an important piece for the first task, should we address it first and move for the second. Just a thought. We could test the usage of the temporary FR further. |
|
Sorry, I'm still catching up from being out earlier this week. I'll take a look at this tomorrow. |
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skipped over the changes to nested function registries since I already reviewed those (I think) in #13252 . I agree with these additions. It would be nice to move away from Substrait plans requiring users to use the consuming sink node and this is a good way to do that while keeping the convenience if desired.
This will enable custom non-embedded functions to be used in Substrait plans although I'd prefer it to be a bit more automatic (e.g. not requiring a second register call).
The ability to easily create plans for writing is a nice convenience as well.
I think there are a few additions here, so I'll try to rebase to make the diff clear.
I'll look into this. |
|
Rebase done and pushed using |
|
@westonpace, is this good to go? |
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more thoughts.
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were there any other changes you still needed to make here @rtpsw or is this good to go?
|
This is good to go from my point of view. I have an upcoming PyArrow UDF PR that will use the changes in this one. |
See https://issues.apache.org/jira/browse/ARROW-16823 Authored-by: Yaron Gvili <[email protected]> Signed-off-by: Weston Pace <[email protected]>
See https://issues.apache.org/jira/browse/ARROW-16823