-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-9430: [C++] Implement replace_with_mask kernel #10412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note once #10410 goes through, this should probably be consolidated into the same file ( |
|
(N.B. needs some more work - trying to add a test with random data) |
|
High-level question: is there a reason that the replacement values are passed through an options struct, and not as third argument? (because that is an array of a different length? But eg for "take" the indices are not in an options struct) And if we can start bike-shedding about the name .. ;) For me, "override_mask" sounds like it would replace the validity mask of the array. So if we use "override" as verb, I would at least make it "overrride_with_mask" or so. But "replace" or "set_values" might also be possible verbs (and with the different variations of this kernel, it could eg be "replace_with_mask", "replace_with_indices", "replace_with_mapping") |
|
It's in a struct because it's a different length, yes. We could make it a vector kernel instead of a scalar kernel and then the replacements could be passed as another argument. I'm not sure which would be more useful though. For the name, replace_with_mask etc. sound good. CC @nirandaperera so he's aware for ARROW-9431 as well. |
|
@lidavidm @jorisvandenbossche @bkietz I'm also thinking about how to handle different length'd arrays for ARROW-9431. Like David said, compute infrastructure guarantees that all arrays passed to the function are of the same length. If we are going ahead with the |
|
BTW we need to add docs to the PR I think 🙂 |
|
Yup, good catch, though maybe let's decide if this is to be a vector or scalar kernel (as I also need to add Python bindings) |
|
This is now a vector kernel, with docs, that should support any fixed-size type. However I still need to add support for binary types which I expect people would want to use. |
|
FWIW: I'd guess this should be a vector kernel. As a thought experiment: I don't think it'd ever be correct to use this function from a query expression even in its scalar form since that'd require prior knowledge of the number of set bits in the mask |
|
@bkietz I think it's correct that this will probably not be used in a typical query execution context. Its main target is to allow to mimic "setitem" operations in eg pandas ( |
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I added a few small doc comments (didn't look at the C++ implementation in detail). The ReplaceWithOptions class still needs to be exposed in Python (you get an import warning at the moment about it)
cpp/src/arrow/compute/api_vector.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// to a true value in the mask with the next element from `options`. | |
| /// to a true value in the mask with the next element from `replacements`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "each corresponding element of the mask will be replaced by the next " | |
| "each corresponding true value of the mask will be replaced by the next " |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This made no sense on a re-read so I reworded the docs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think popcnt will not be clear for many users. Maybe something like sum(mask == true) ?
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-fixed width and binary is now also supported already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "value of the replacements (or null if the mask is null)." | |
| "value of the replacements (or null if the mask is null). " |
|
I removed |
16779ec to
cf41f39
Compare
|
@bkietz @jorisvandenbossche I know y'all are busy, but any other comments? Once this is in, @nirandaperera can get started on ARROW-9431 on top of this |
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did another pass. Looking good to me (but I just checked the docs and code logic, for the C++ implementation details I will defer to someone more knowledgeable)
Should there maybe be some tests where the array, mask or replacement are sliced? (have an offset) Or is that already covered / not a typical risk to go wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ARROW_ASSIGN_OR_RAISE(auto array, | |
| MakeArrayOfNull(array.type, array.length, ctx->memory_pool())); | |
| *output = *array->data(); | |
| ARROW_ASSIGN_OR_RAISE(auto replacement_array, | |
| MakeArrayOfNull(array.type, array.length, ctx->memory_pool())); | |
| *output = *replacement_array->data(); |
(to be consistent with below, and not confuse with the input array which uses the same variable name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment to mention this is setting the out_bitmap to all valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidavidm I think it will be better to use BitUtil::SetBitsTo/SetBitmap here. It would more precisely set values upto the [offset, offset+length).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might over-allocate a bit and we should make sure any such bytes are initialized. Also I'd guess it's faster to just bit-blit a constant value over a buffer rather than try to set bits precisely. (SetBitsTo i s quite a bit more complicated.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if we wanted to support can_write_into_slices then we'd need SetBitsTo here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would prevent stepping on other slice's result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hasn't this already been done in the else if (valid_block.NoneSet()) { block above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That branch only applies some of the time, but you are right in that there's no point doing it above since it's replicated here. (However the branch should be kept to skip those values.)
|
The |
|
@bkietz do you want to take another look here? |
|
Note there's some code here for handling fixed-width types that now duplicates what's in ARROW-13064/#10557. We should probably unify those at some point (after one or the other merges). |
|
Supported benchmark command examples:
To run all benchmarks: To filter benchmarks by language: To filter Python and R benchmarks by name: To filter C++ benchmarks by archery --suite-filter and --benchmark-filter: For other |
|
@ursabot please benchmark lang=C++ |
|
Benchmark runs are scheduled for baseline = 7eea2f5 and contender = f79438d96c42b7728c3f9860aadad545cc5ac483. Results will be available as each benchmark for each run completes. |
|
Final comments from anyone? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be called for a FixedWidthType execution? If so, this might conflict with computed_preallocate option. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same applies to other mem allocations in the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this needs to be a bit smarter.
I also notice that it doesn't handle the case where len(replacements) > len(mask).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const CopyBitmap copy_bitmap, const uint8_t* mask_bitmap, | |
| const CopyBitmap ©_bitmap, const uint8_t* mask_bitmap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I intentionally made CopyBitmap itself cheaper to copy than to use as a reference - it's <= 2 words though I suppose the compiler will optimize it identically either way.
Except, doing this does seem about ~5% slower:
Before:
-------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
ReplaceWithMaskLowSelectivityBench/16384/0 33631 ns 33631 ns 204971 bytes_per_second=3.62975G/s
ReplaceWithMaskLowSelectivityBench/16384/99 35018 ns 35017 ns 202363 bytes_per_second=3.46498G/s
ReplaceWithMaskHighSelectivityBench/16384/0 77268 ns 77267 ns 90912 bytes_per_second=1.57985G/s
ReplaceWithMaskHighSelectivityBench/16384/99 75751 ns 75750 ns 92444 bytes_per_second=1.60176G/s
After:
-------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
ReplaceWithMaskLowSelectivityBench/16384/0 35512 ns 35511 ns 192582 bytes_per_second=3.43751G/s
ReplaceWithMaskLowSelectivityBench/16384/99 36702 ns 36701 ns 191996 bytes_per_second=3.30598G/s
ReplaceWithMaskHighSelectivityBench/16384/0 82957 ns 82956 ns 85194 bytes_per_second=1.47151G/s
ReplaceWithMaskHighSelectivityBench/16384/99 80415 ns 80413 ns 86354 bytes_per_second=1.50887G/s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can create a BooleanArray from mask, then we can call true_count straight away! It does this internally IINM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that we should leave all kernels as following,
kernel.null_handling = NullHandling::type::COMPUTED_NO_PREALLOCATE;
kernel.mem_allocation = MemAllocation::type::NO_PREALLOCATE;
and later, change these flags. One thing I realized was, NullHandling::COMPUTED_PREALLOCATE, and MemAllocation::PREALLOCATE introduces a lot of niche cases. It was helpful for me to test those cases using compute::CheckScalar test util. It checks for slicing, chunks etc for scalar functions.
| void CheckScalar(std::string func_name, const DatumVector& inputs, |
But now that we are on vector functions, the semantics might change :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkietz WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unfortunate that we don't have a versatile utility for vector functions like CheckScalar. One way to verify correct writing into slices would be: run the function once to ensure output is correctly allocated/shaped/etc, then invoke the kernel directly into a slice of that output. If everything is working as it should, the kernel should simply overwrite that slice with new values, leaving values outside the slice untouched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually: I'm not sure if it even makes sense for this kernel to write into a slice, since it needs the entirety of all its arguments to execute. So if we manually force it to write into a slice of the output, it'd write different results.
|
I think I've addressed all the feedback here. |
I'm +1 for this! |
|
Merged, thanks. This should unblock ARROW-9431 if you do still plan to look at it. |
|
Nice to see this one merged, thanks all! |
This implements a kernel equivalent to NumPy's
arr[mask] = [values], i.e. given an array and an equal-length (or scalar) boolean mask, along with an array of replacement values passed via options, each array item for which the corresponding mask value istrueis replaced with the next value from the replacement value array.