Skip to content

Conversation

@lidavidm
Copy link
Member

@lidavidm lidavidm commented Sep 3, 2021

No description provided.

@github-actions
Copy link

github-actions bot commented Sep 3, 2021

@lidavidm lidavidm marked this pull request as draft September 7, 2021 18:41
@lidavidm
Copy link
Member Author

lidavidm commented Sep 7, 2021

Converting to draft while I work out the Windows CI issues.

@lidavidm lidavidm marked this pull request as ready for review September 8, 2021 13:25
@ianmcook ianmcook requested a review from bkietz September 8, 2021 14:21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop iterates for Scalars but not for Arrays?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the stated purpose of the loop. This is to optimize a special case where we have 0 or more all-null arguments followed by an all-non-null argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: In the first loop, you check for datum.is_array() but not here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check above is not just for array. Here there is no need since we know it is either an array or a scalar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit cryptic if you don't know a union is expected. What about e.g. union{number: int8 = -29}?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this isn't going to be very performant compared to e.g. raw_builder->Append(source_array.GetView(i)). Not necessarily worth addressing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we'd have the overhead of a virtual call and some other bookkeeping vs just a direct append call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to add a test for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, thanks for the catch. After looking things over I ended up reworking DispatchBest here, adding checks to ensure parameterized types have the same parameters, and adding some basic tests of the helpers we use for promotions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, one more thing I want to do now is have CheckDispatchBest also ensure that the promoted ValueDescrs from DispatchBest match the ones given to the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - I also adjusted some tests and added a set of tests specifically for the type promotion helpers we use.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Just a couple more comments and questions below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a reminder that we'd like a std::span backport at some point ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... if saw_date64 is true but saw_date32 false, we should still return date64, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still examine other types in case they are incompatible, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an error in case BasicDecimal256::kMaxPrecision is exceeded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this makes the name of the function a bit weird ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replace "identical" with "compatible"? After all, some implicit casting is allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you could simply have:

template <typename Type>
struct CoalesceFunctor<Type,
    std::enable_if<is_nested_type<Type>::value && !is_union_type<Type>::value>::type> {
  // common implementation for non-union nested types
};

{decimal128(2, 1), decimal128(2, 1)});
CheckDispatchBest(name, {decimal256(2, 1), decimal256(2, 1)},
{decimal256(3, 1), decimal256(3, 1)});
{decimal256(2, 1), decimal256(2, 1)});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprising to me. Could you comment on why the implicit cast is no longer necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DispatchBest in this case doesn't actually promote either argument. When I adjusted CheckDispatchBest to actually compare the final types against the given types, it exposed discrepancies like this that I fixed. In the logic here, for add/subtract, we do not scale up with the scales are the same:

switch (promotion) {
case DecimalPromotion::kAdd: {
left_scaleup = std::max(s1, s2) - s1;
right_scaleup = std::max(s1, s2) - s2;
break;
}

@lidavidm
Copy link
Member Author

Just a ping here for either @pitrou or @bkietz 🙂

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you very much!

@pitrou pitrou closed this in b60bbb3 Sep 29, 2021
@lidavidm lidavidm deleted the arrow-13390 branch September 29, 2021 11:57
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
Closes apache#11080 from lidavidm/arrow-13390

Authored-by: David Li <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants