Skip to content

Conversation

@lidavidm
Copy link
Member

This implements a kernel equivalent to NumPy's arr[mask] = [values], i.e. given an array and an equal-length (or scalar) boolean mask, along with an array of replacement values passed via options, each array item for which the corresponding mask value is true is replaced with the next value from the replacement value array.

@lidavidm
Copy link
Member Author

Note once #10410 goes through, this should probably be consolidated into the same file (scalar_if_else.cc).

@github-actions
Copy link

@lidavidm
Copy link
Member Author

(N.B. needs some more work - trying to add a test with random data)

@lidavidm lidavidm marked this pull request as draft May 27, 2021 16:07
@lidavidm lidavidm marked this pull request as ready for review May 28, 2021 13:27
@jorisvandenbossche
Copy link
Member

High-level question: is there a reason that the replacement values are passed through an options struct, and not as third argument? (because that is an array of a different length? But eg for "take" the indices are not in an options struct)

And if we can start bike-shedding about the name .. ;) For me, "override_mask" sounds like it would replace the validity mask of the array. So if we use "override" as verb, I would at least make it "overrride_with_mask" or so. But "replace" or "set_values" might also be possible verbs (and with the different variations of this kernel, it could eg be "replace_with_mask", "replace_with_indices", "replace_with_mapping")

@lidavidm
Copy link
Member Author

lidavidm commented Jun 2, 2021

It's in a struct because it's a different length, yes. We could make it a vector kernel instead of a scalar kernel and then the replacements could be passed as another argument. I'm not sure which would be more useful though.

For the name, replace_with_mask etc. sound good. CC @nirandaperera so he's aware for ARROW-9431 as well.

@nirandaperera
Copy link
Contributor

@lidavidm @jorisvandenbossche @bkietz I'm also thinking about how to handle different length'd arrays for ARROW-9431. Like David said, compute infrastructure guarantees that all arrays passed to the function are of the same length. If we are going ahead with the Options approach, for ARROW-9431, it would be a unary function and other 2 arrays in the options.
So should we make ARROW-9430 and ARROW-9431 vector kernels instead?

@lidavidm lidavidm changed the title ARROW-9430: [C++] Implement override_mask kernel ARROW-9430: [C++] Implement replace_with_mask kernel Jun 2, 2021
@nirandaperera
Copy link
Contributor

BTW we need to add docs to the PR I think 🙂

@lidavidm
Copy link
Member Author

lidavidm commented Jun 2, 2021

Yup, good catch, though maybe let's decide if this is to be a vector or scalar kernel (as I also need to add Python bindings)

@lidavidm lidavidm marked this pull request as draft June 2, 2021 15:51
@lidavidm
Copy link
Member Author

lidavidm commented Jun 2, 2021

This is now a vector kernel, with docs, that should support any fixed-size type. However I still need to add support for binary types which I expect people would want to use.

@lidavidm lidavidm marked this pull request as ready for review June 3, 2021 17:24
@bkietz
Copy link
Member

bkietz commented Jun 3, 2021

FWIW: I'd guess this should be a vector kernel. As a thought experiment: I don't think it'd ever be correct to use this function from a query expression even in its scalar form since that'd require prior knowledge of the number of set bits in the mask

@jorisvandenbossche
Copy link
Member

@bkietz I think it's correct that this will probably not be used in a typical query execution context. Its main target is to allow to mimic "setitem" operations in eg pandas (arr[mask] = val).

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I added a few small doc comments (didn't look at the C++ implementation in detail). The ReplaceWithOptions class still needs to be exposed in Python (you get an import warning at the moment about it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// to a true value in the mask with the next element from `options`.
/// to a true value in the mask with the next element from `replacements`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"each corresponding element of the mask will be replaced by the next "
"each corresponding true value of the mask will be replaced by the next "

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made no sense on a re-read so I reworded the docs here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think popcnt will not be clear for many users. Maybe something like sum(mask == true) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-fixed width and binary is now also supported already?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"value of the replacements (or null if the mask is null)."
"value of the replacements (or null if the mask is null). "

@lidavidm
Copy link
Member Author

lidavidm commented Jun 4, 2021

I removed ReplaceWithOptions when I made this a vector kernel, so I fixed up the docs - thanks for catching that.

@lidavidm lidavidm force-pushed the arrow-9430 branch 2 times, most recently from 16779ec to cf41f39 Compare June 14, 2021 12:32
@lidavidm
Copy link
Member Author

@bkietz @jorisvandenbossche I know y'all are busy, but any other comments? Once this is in, @nirandaperera can get started on ARROW-9431 on top of this

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another pass. Looking good to me (but I just checked the docs and code logic, for the C++ implementation details I will defer to someone more knowledgeable)

Should there maybe be some tests where the array, mask or replacement are sliced? (have an offset) Or is that already covered / not a typical risk to go wrong?

Comment on lines 40 to 42
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ARROW_ASSIGN_OR_RAISE(auto array,
MakeArrayOfNull(array.type, array.length, ctx->memory_pool()));
*output = *array->data();
ARROW_ASSIGN_OR_RAISE(auto replacement_array,
MakeArrayOfNull(array.type, array.length, ctx->memory_pool()));
*output = *replacement_array->data();

(to be consistent with below, and not confuse with the input array which uses the same variable name)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to mention this is setting the out_bitmap to all valid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidavidm I think it will be better to use BitUtil::SetBitsTo/SetBitmap here. It would more precisely set values upto the [offset, offset+length).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might over-allocate a bit and we should make sure any such bytes are initialized. Also I'd guess it's faster to just bit-blit a constant value over a buffer rather than try to set bits precisely. (SetBitsTo i s quite a bit more complicated.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we wanted to support can_write_into_slices then we'd need SetBitsTo here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would prevent stepping on other slice's result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hasn't this already been done in the else if (valid_block.NoneSet()) { block above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That branch only applies some of the time, but you are right in that there's no point doing it above since it's replicated here. (However the branch should be kept to skip those values.)

@lidavidm
Copy link
Member Author

The ReplaceWithMaskRandom tests do test sliced arrays. (Unlike the scalar tests we can't test slices just by slicing the given example inputs, unfortunately.)

@jorisvandenbossche
Copy link
Member

@bkietz do you want to take another look here?

@lidavidm
Copy link
Member Author

Note there's some code here for handling fixed-width types that now duplicates what's in ARROW-13064/#10557. We should probably unify those at some point (after one or the other merges).

@ursabot
Copy link

ursabot commented Jul 8, 2021

Supported benchmark command examples:

@ursabot benchmark help

To run all benchmarks:
@ursabot please benchmark

To filter benchmarks by language:
@ursabot please benchmark lang=Python
@ursabot please benchmark lang=C++
@ursabot please benchmark lang=R
@ursabot please benchmark lang=Java

To filter Python and R benchmarks by name:
@ursabot please benchmark name=file-write
@ursabot please benchmark name=file-write lang=Python
@ursabot please benchmark name=file-.*

To filter C++ benchmarks by archery --suite-filter and --benchmark-filter:
@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-vector-selection-benchmark --benchmark-filter=TakeStringRandomIndicesWithNulls/262144/2 --iterations=3

For other command=cpp-micro options, please see https://github.com/ursacomputing/benchmarks/blob/main/benchmarks/cpp_micro_benchmarks.py

@lidavidm
Copy link
Member Author

lidavidm commented Jul 8, 2021

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Jul 8, 2021

Benchmark runs are scheduled for baseline = 7eea2f5 and contender = f79438d96c42b7728c3f9860aadad545cc5ac483. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2 (mimalloc)
[Skipped ⚠️ Only ['Python', 'R'] langs are supported on ursa-i9-9960x] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.0% ⬆️0.0%] ursa-thinkcentre-m75q (mimalloc)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

@lidavidm
Copy link
Member Author

Final comments from anyone?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be called for a FixedWidthType execution? If so, this might conflict with computed_preallocate option. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same applies to other mem allocations in the function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this needs to be a bit smarter.

I also notice that it doesn't handle the case where len(replacements) > len(mask).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const CopyBitmap copy_bitmap, const uint8_t* mask_bitmap,
const CopyBitmap &copy_bitmap, const uint8_t* mask_bitmap,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I intentionally made CopyBitmap itself cheaper to copy than to use as a reference - it's <= 2 words though I suppose the compiler will optimize it identically either way.

Except, doing this does seem about ~5% slower:

Before:
-------------------------------------------------------------------------------------------------------
Benchmark                                             Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
ReplaceWithMaskLowSelectivityBench/16384/0        33631 ns        33631 ns       204971 bytes_per_second=3.62975G/s
ReplaceWithMaskLowSelectivityBench/16384/99       35018 ns        35017 ns       202363 bytes_per_second=3.46498G/s
ReplaceWithMaskHighSelectivityBench/16384/0       77268 ns        77267 ns        90912 bytes_per_second=1.57985G/s
ReplaceWithMaskHighSelectivityBench/16384/99      75751 ns        75750 ns        92444 bytes_per_second=1.60176G/s

After:
-------------------------------------------------------------------------------------------------------
Benchmark                                             Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
ReplaceWithMaskLowSelectivityBench/16384/0        35512 ns        35511 ns       192582 bytes_per_second=3.43751G/s
ReplaceWithMaskLowSelectivityBench/16384/99       36702 ns        36701 ns       191996 bytes_per_second=3.30598G/s
ReplaceWithMaskHighSelectivityBench/16384/0       82957 ns        82956 ns        85194 bytes_per_second=1.47151G/s
ReplaceWithMaskHighSelectivityBench/16384/99      80415 ns        80413 ns        86354 bytes_per_second=1.50887G/s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting 😄

Comment on lines 158 to 165
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can create a BooleanArray from mask, then we can call true_count straight away! It does this internally IINM.

Comment on lines +471 to +507
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that we should leave all kernels as following,

kernel.null_handling = NullHandling::type::COMPUTED_NO_PREALLOCATE;
kernel.mem_allocation = MemAllocation::type::NO_PREALLOCATE;

and later, change these flags. One thing I realized was, NullHandling::COMPUTED_PREALLOCATE, and MemAllocation::PREALLOCATE introduces a lot of niche cases. It was helpful for me to test those cases using compute::CheckScalar test util. It checks for slicing, chunks etc for scalar functions.

void CheckScalar(std::string func_name, const DatumVector& inputs,

But now that we are on vector functions, the semantics might change :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkietz WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unfortunate that we don't have a versatile utility for vector functions like CheckScalar. One way to verify correct writing into slices would be: run the function once to ensure output is correctly allocated/shaped/etc, then invoke the kernel directly into a slice of that output. If everything is working as it should, the kernel should simply overwrite that slice with new values, leaving values outside the slice untouched.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually: I'm not sure if it even makes sense for this kernel to write into a slice, since it needs the entirety of all its arguments to execute. So if we manually force it to write into a slice of the output, it'd write different results.

@lidavidm
Copy link
Member Author

I think I've addressed all the feedback here.

@nirandaperera
Copy link
Contributor

I think I've addressed all the feedback here.

I'm +1 for this!

@lidavidm lidavidm closed this in 6db88a9 Jul 14, 2021
@lidavidm
Copy link
Member Author

Merged, thanks. This should unblock ARROW-9431 if you do still plan to look at it.

@jorisvandenbossche
Copy link
Member

Nice to see this one merged, thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants