-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13390: [C++] Implement coalesce for remaining types #11080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
03411e6 to
2b34b47
Compare
|
Converting to draft while I work out the Windows CI issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop iterates for Scalars but not for Arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the stated purpose of the loop. This is to optimize a special case where we have 0 or more all-null arguments followed by an all-non-null argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: In the first loop, you check for datum.is_array() but not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check above is not just for array. Here there is no need since we know it is either an array or a scalar.
85dc802 to
8a71120
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit cryptic if you don't know a union is expected. What about e.g. union{number: int8 = -29}?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this isn't going to be very performant compared to e.g. raw_builder->Append(source_array.GetView(i)). Not necessarily worth addressing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we'd have the overhead of a virtual call and some other bookkeeping vs just a direct append call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to add a test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, thanks for the catch. After looking things over I ended up reworking DispatchBest here, adding checks to ensure parameterized types have the same parameters, and adding some basic tests of the helpers we use for promotions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, one more thing I want to do now is have CheckDispatchBest also ensure that the promoted ValueDescrs from DispatchBest match the ones given to the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - I also adjusted some tests and added a set of tests specifically for the type promotion helpers we use.
8a71120 to
0d6c742
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Just a couple more comments and questions below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a reminder that we'd like a std::span backport at some point ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... if saw_date64 is true but saw_date32 false, we should still return date64, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still examine other types in case they are incompatible, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an error in case BasicDecimal256::kMaxPrecision is exceeded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this makes the name of the function a bit weird ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we replace "identical" with "compatible"? After all, some implicit casting is allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you could simply have:
template <typename Type>
struct CoalesceFunctor<Type,
std::enable_if<is_nested_type<Type>::value && !is_union_type<Type>::value>::type> {
// common implementation for non-union nested types
};f004db9 to
887756f
Compare
887756f to
3058559
Compare
| {decimal128(2, 1), decimal128(2, 1)}); | ||
| CheckDispatchBest(name, {decimal256(2, 1), decimal256(2, 1)}, | ||
| {decimal256(3, 1), decimal256(3, 1)}); | ||
| {decimal256(2, 1), decimal256(2, 1)}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surprising to me. Could you comment on why the implicit cast is no longer necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DispatchBest in this case doesn't actually promote either argument. When I adjusted CheckDispatchBest to actually compare the final types against the given types, it exposed discrepancies like this that I fixed. In the logic here, for add/subtract, we do not scale up with the scales are the same:
arrow/cpp/src/arrow/compute/kernels/codegen_internal.cc
Lines 268 to 273 in 3317f83
| switch (promotion) { | |
| case DecimalPromotion::kAdd: { | |
| left_scaleup = std::max(s1, s2) - s1; | |
| right_scaleup = std::max(s1, s2) - s2; | |
| break; | |
| } |
Co-authored-by: Benjamin Kietzman <[email protected]>
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you very much!
Closes apache#11080 from lidavidm/arrow-13390 Authored-by: David Li <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
No description provided.