apply_ufunc support for chunks on input_core_dims

I am trying to optimize the following function:

    c = (a * b).sum('x', skipna=False)

where a and b are xarray.DataArray's, both with dimension x and both with dask backend.

I successfully obtained a 5.5x speedup with the following:

    @numba.guvectorize(['void(float64[:], float64[:], float64[:])'], '(n),(n)->()', nopython=True, cache=True)
    def mulsum(a, b, res):
        acc = 0
        for i in range(a.size):
            acc += a[i] * b[i]
        res.flat[0] = acc

    c = xarray.apply_ufunc(
        mulsum, a, b,
        input_core_dims=[['x'], ['x']],
        dask='parallelized', output_dtypes=[float])

The problem is that this introduces a (quite problematic, in my case) constraint that a and b can't be chunked on dimension x - which is theoretically avoidable as long as the kernel function doesn't need interaction between x[i] and x[j] (e.g. it can't work for an interpolator, which would require to rely on dask ghosting).

# Proposal 
Add a parameter to apply_ufunc, ``reduce_func=None``. reduce_func is a function which takes as input two parameters a, b that are the output of func. apply_ufunc will invoke it whenever there's chunking on an input_core_dim.

e.g. my use case above would simply become:

    c = xarray.apply_ufunc(
        mulsum, a, b,
        input_core_dims=[['x'], ['x']],
        dask='parallelized', output_dtypes=[float], reduce_func=operator.sum)

So if I have 2 chunks in a and b on dimension x, apply_ufunc will internally do

    c1 = mulsum(a1, b1)
    c2 = mulsum(a2, b2)
    c = operator.sum(c1, c2)

Note that reduce_func will be invoked exclusively in presence of dask='parallelized' and when there's chunking on one or more of the input_core_dims. If reduce_func is left to None, apply_ufunc will keep crashing like it does now.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

apply_ufunc support for chunks on input_core_dims #1995

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

apply_ufunc support for chunks on input_core_dims #1995

Description

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions