Skip to content

Conversation

@rtpsw
Copy link
Contributor

@rtpsw rtpsw commented Jun 13, 2022

@github-actions
Copy link

@vibhatha
Copy link
Collaborator

@rtpsw I did skim through the PR, interesting!.
What is required to get a functional test case to evaluate registering a UDF? I went through the JIRA, but it is not clear what is meant by

registering a function (with an Id) external to the plan

@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 13, 2022

@rtpsw I did skim through the PR, interesting!. What is required to get a functional test case to evaluate registering a UDF? I went through the JIRA, but it is not clear what is meant by

registering a function (with an Id) external to the plan

You're right, this isn't trivial. The issue is that Id is defined with two members of type util::string_view, which point to strings that must be held elsewhere while the plan is in scope. This is satisfied for an Id that has these fields pointing to strings parsed from the Substrait plan itself. However, if a user registers a function with an Id whose field values are external to the plan, then the strings they point to should be held while the plan is in scope, so it makes sense to keep the strings on the plan; this is what AddExternalSymbol is meant for.

I intend to add test cases a bit later. This PR is an extraction from a larger project project I'm working on for end-to-end (Ibis/Ibis-Substrait/PyArrow) support for Python-UDFs.

@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 14, 2022

@vibhatha, I think this PR is ready for review. Are you the one to review it?

@vibhatha
Copy link
Collaborator

@vibhatha, I think this PR is ready for review. Are you the one to review it?

@rtpsw I was reading it now. But I won't be a major reviewer. I will be closely reading and co-review certain parts.

cc @westonpace @lidavidm could you please take a look.

@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 15, 2022

Right now I see only a minor change waiting for me to make. Let me know if you're still reviewing and I'll hold for your notification.

@lidavidm
Copy link
Member

@vibhatha, I'm not up to date on Acero/Substrait progress anymore. Are the changes here reasonable?

@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 15, 2022

@vibhatha, I'm not up to date on Acero/Substrait progress anymore. Are the changes here reasonable?

I have some explanation here, in case it helps. The TBD parts are expected in an upcoming PR (or two) I'll prepare.

@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 15, 2022

For background on nested registries, see:

@vibhatha
Copy link
Collaborator

@vibhatha, I'm not up to date on Acero/Substrait progress anymore. Are the changes here reasonable?

@lidavidm I am going to go through again. I will check with my knowlege on ACERO/substrait. But it would be better to have another review from @westonpace on this.

@vibhatha
Copy link
Collaborator

cc @westonpace could please take a look.

@vibhatha
Copy link
Collaborator

@vibhatha, I'm not up to date on Acero/Substrait progress anymore. Are the changes here reasonable?

I have some explanation here, in case it helps. The TBD parts are expected in an upcoming PR (or two) I'll prepare.

@rtpsw I added a comment to the JIRA. Appreciate your feedback to clarify the design and usage.

@vibhatha
Copy link
Collaborator

I think we have two important pieces discussed here. One is how Substrait-UDF usage is benefitted and the second is how the function registry usage must be modified. Since the function registry usage is an important piece for the first task, should we address it first and move for the second. Just a thought. We could test the usage of the temporary FR further.

@westonpace westonpace self-requested a review June 16, 2022 21:35
@westonpace
Copy link
Member

Sorry, I'm still catching up from being out earlier this week. I'll take a look at this tomorrow.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skipped over the changes to nested function registries since I already reviewed those (I think) in #13252 . I agree with these additions. It would be nice to move away from Substrait plans requiring users to use the consuming sink node and this is a good way to do that while keeping the convenience if desired.

This will enable custom non-embedded functions to be used in Substrait plans although I'd prefer it to be a bit more automatic (e.g. not requiring a second register call).

The ability to easily create plans for writing is a nice convenience as well.

@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 17, 2022

I skipped over the changes to nested function registries since I already reviewed those (I think) in #13252 .

I think there are a few additions here, so I'll try to rebase to make the diff clear.

This will enable custom non-embedded functions to be used in Substrait plans although I'd prefer it to be a bit more automatic (e.g. not requiring a second register call).

I'll look into this.

@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 19, 2022

Rebase done and pushed using git push --force-with-lease origin HEAD.

@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 23, 2022

@westonpace, is this good to go?

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more thoughts.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were there any other changes you still needed to make here @rtpsw or is this good to go?

@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 30, 2022

This is good to go from my point of view. I have an upcoming PyArrow UDF PR that will use the changes in this one.

@westonpace westonpace merged commit f3bdcce into apache:master Jun 30, 2022
@rtpsw rtpsw deleted the ARROW-16823 branch July 3, 2022 12:15
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants