Skip to content

Conversation

@zeroshade
Copy link
Member

@zeroshade zeroshade commented Sep 4, 2024

Rationale for this change

Widening the Decimal128/256 type to allow for bitwidths of 32 and 64 allows for more interoperability with other libraries and utilities which already support these types. This provides even more opportunities for zero-copy interactions between things such as libcudf and various databases.

What changes are included in this PR?

This PR contains the basic C++ implementations for Decimal32/Decimal64 types, arrays, builders and scalars. It also includes the minimum necessary to get everything compiling and tests passing without also extending the acero kernels and parquet handling (both of which will be handled in follow-up PRs).

Are these changes tested?

Yes, tests were extended where applicable to add decimal32/decimal64 cases.

Are there any user-facing changes?

Currently if a user is using decimal(precision, scale) rather than decimal128(precision, scale) they will get a Decimal128Type if the precision is <= 38 (max precision for Decimal128) and Decimal256Type if the precision is higher. Following the same pattern, this change means that using decimal(precision, scale) instead of the specific decimal32/decimal64/decimal128/decimal256 functions results in the following functionality:

  • for precisions [1 : 9] => Decimal32Type
  • for precisions [10 : 18] => Decimal64Type
  • for precisions [19 : 38] => Decimal128Type
  • for precisions [39 : 76] => Decimal256Type

While many of our tests currently make the assumption that decimal with a low precision would be Decimal128 and had to be updated, this may cause an initial surprise if users are making the same assumptions.

@github-actions
Copy link

github-actions bot commented Sep 4, 2024

⚠️ GitHub issue #43956 has been automatically assigned in GitHub to PR creator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here. (Should we file issues to come back to these?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are commented out because we didn't implement casting for the new decimal types. This is mentioned in the issue as check boxes to do rather than as an entirely separate issue currently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's going to be a separate PR, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i didn't want to make this already large PR even larger. I'll implement the cast kernels and so on as a follow-up PR

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Sep 5, 2024
@pitrou
Copy link
Member

pitrou commented Sep 5, 2024

Following the same pattern, this change means that using decimal(precision, scale) instead of the specific decimal32/decimal64/decimal128/decimal256 functions results in the following functionality

I'm afraid this may massively break user code. I would suggest another approach:

  • deprecate the decimal() factory while keeping its current behavior of always returning at least decimal128
  • introduce a new smallest_decimal() factory that is documented to return the smallest possible type, and explicitly makes no guarantees about the stability of the return type

@wgtmac
Copy link
Member

wgtmac commented Sep 5, 2024

Following the same pattern, this change means that using decimal(precision, scale) instead of the specific decimal32/decimal64/decimal128/decimal256 functions results in the following functionality

I'm afraid this may massively break user code. I would suggest another approach:

  • deprecate the decimal() factory while keeping its current behavior of always returning at least decimal128
  • introduce a new smallest_decimal() factory that is documented to return the smallest possible type, and explicitly makes no guarantees about the stability of the return type

I just have the same concern. +1 on the proposed workaround.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Sep 5, 2024
@zeroshade
Copy link
Member Author

@pitrou @bkietz @wgtmac I've updated this based on the suggestion, created a smallest_decimal function and added a deprecated message to the docstring for decimal.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Sep 5, 2024
@zeroshade zeroshade merged commit d55d4c6 into apache:main Sep 30, 2024
@zeroshade zeroshade deleted the cpp-decimal32-64 branch September 30, 2024 21:15
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit d55d4c6.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 27 possible false positives for unstable benchmarks that are known to sometimes produce them.

@pitrou
Copy link
Member

pitrou commented Oct 1, 2024

Hmm, did you notice the UBSAN failure in Decimal32Test.LeftShift?
https://github.com/apache/arrow/actions/runs/11115928849/job/30885255275#step:6:6273

(you can easily run this build locally using archery docker if you don't want to wait for CI every time :-))

thisisnic added a commit that referenced this pull request Jun 16, 2025
### Rationale for this change

32 and 64 bit Decimal types were added in C++ in #43957 but haven't been implemented in R yet

### What changes are included in this PR?

Implements them in R

### Are these changes tested?

Yup

### Are there any user-facing changes?

Yeah, new types but also the implicit downcasting so we should think about how to communicate this if at all

* GitHub Issue: #46719

Authored-by: Nic Crane <[email protected]>
Signed-off-by: Nic Crane <[email protected]>
alinaliBQ pushed a commit to Bit-Quill/arrow that referenced this pull request Jun 17, 2025
### Rationale for this change

32 and 64 bit Decimal types were added in C++ in apache#43957 but haven't been implemented in R yet

### What changes are included in this PR?

Implements them in R

### Are these changes tested?

Yup

### Are there any user-facing changes?

Yeah, new types but also the implicit downcasting so we should think about how to communicate this if at all

* GitHub Issue: apache#46719

Authored-by: Nic Crane <[email protected]>
Signed-off-by: Nic Crane <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants