Skip to content

Running Python UDFs in Weld. #523

@kchasialis

Description

@kchasialis

I am trying to run a UDF pipeline on a dataset using Weld (or grizzly, I suppose).

Grizzly, however, (as far as I know) does not offer an optimized function to apply for example a scalar UDF on a specific column of the dataset.

I found that one way to do it is to access the internal data using to_pandas() which has a function called “apply” and use this function to run a Python UDF on a column.

The problem is that I want to measure Weld’s performance on UDFs and by accessing the internal data and applying the functions just like a normal python program would do is not a fair way to measure Weld’s performance regarding (Python) UDF execution.

How can I apply a python UDF on a column of the dataset in an optimized way using Weld?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions