Running Python UDFs in Weld.

I am trying to run a UDF pipeline on a dataset using Weld (or grizzly, I suppose).

Grizzly, however, (as far as I know) does not offer an optimized function to apply for example a scalar UDF on a specific column of the dataset.

I found that one way to do it is to access the internal data using to_pandas() which has a function called “apply” and use this function to run a Python UDF on a column.

The problem is that I want to measure Weld’s performance on UDFs and by accessing the internal data and applying the functions just like a normal python program would do is not a fair way to measure Weld’s performance regarding (Python) UDF execution.

How can I apply a python UDF on a column of the dataset in an optimized way using Weld?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running Python UDFs in Weld. #523

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running Python UDFs in Weld. #523

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions