-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
I am trying to optimize the following function:
c = (a * b).sum('x', skipna=False)
where a and b are xarray.DataArray's, both with dimension x and both with dask backend.
I successfully obtained a 5.5x speedup with the following:
@numba.guvectorize(['void(float64[:], float64[:], float64[:])'], '(n),(n)->()', nopython=True, cache=True)
def mulsum(a, b, res):
acc = 0
for i in range(a.size):
acc += a[i] * b[i]
res.flat[0] = acc
c = xarray.apply_ufunc(
mulsum, a, b,
input_core_dims=[['x'], ['x']],
dask='parallelized', output_dtypes=[float])
The problem is that this introduces a (quite problematic, in my case) constraint that a and b can't be chunked on dimension x - which is theoretically avoidable as long as the kernel function doesn't need interaction between x[i] and x[j] (e.g. it can't work for an interpolator, which would require to rely on dask ghosting).
Proposal
Add a parameter to apply_ufunc, reduce_func=None. reduce_func is a function which takes as input two parameters a, b that are the output of func. apply_ufunc will invoke it whenever there's chunking on an input_core_dim.
e.g. my use case above would simply become:
c = xarray.apply_ufunc(
mulsum, a, b,
input_core_dims=[['x'], ['x']],
dask='parallelized', output_dtypes=[float], reduce_func=operator.sum)
So if I have 2 chunks in a and b on dimension x, apply_ufunc will internally do
c1 = mulsum(a1, b1)
c2 = mulsum(a2, b2)
c = operator.sum(c1, c2)
Note that reduce_func will be invoked exclusively in presence of dask='parallelized' and when there's chunking on one or more of the input_core_dims. If reduce_func is left to None, apply_ufunc will keep crashing like it does now.