GH-43956: [C++][Format] Add initial Decimal32/Decimal64 implementations #43957

zeroshade · 2024-09-04T19:40:18Z

Rationale for this change

Widening the Decimal128/256 type to allow for bitwidths of 32 and 64 allows for more interoperability with other libraries and utilities which already support these types. This provides even more opportunities for zero-copy interactions between things such as libcudf and various databases.

What changes are included in this PR?

This PR contains the basic C++ implementations for Decimal32/Decimal64 types, arrays, builders and scalars. It also includes the minimum necessary to get everything compiling and tests passing without also extending the acero kernels and parquet handling (both of which will be handled in follow-up PRs).

Are these changes tested?

Yes, tests were extended where applicable to add decimal32/decimal64 cases.

Are there any user-facing changes?

Currently if a user is using decimal(precision, scale) rather than decimal128(precision, scale) they will get a Decimal128Type if the precision is <= 38 (max precision for Decimal128) and Decimal256Type if the precision is higher. Following the same pattern, this change means that using decimal(precision, scale) instead of the specific decimal32/decimal64/decimal128/decimal256 functions results in the following functionality:

for precisions [1 : 9] => Decimal32Type
for precisions [10 : 18] => Decimal64Type
for precisions [19 : 38] => Decimal128Type
for precisions [39 : 76] => Decimal256Type

While many of our tests currently make the assumption that decimal with a low precision would be Decimal128 and had to be updated, this may cause an initial surprise if users are making the same assumptions.

GitHub Issue: [Format] Add Decimal32 and Decimal64 to Arrow #43956

github-actions · 2024-09-04T19:40:45Z

⚠️ GitHub issue #43956 has been automatically assigned in GitHub to PR creator.

cpp/src/arrow/type.cc

cpp/src/arrow/array/builder_dict.h

cpp/src/arrow/compute/kernels/codegen_internal.h

lidavidm · 2024-09-05T00:37:17Z

cpp/src/arrow/testing/gtest_util.h

Ditto here. (Should we file issues to come back to these?)

These are commented out because we didn't implement casting for the new decimal types. This is mentioned in the issue as check boxes to do rather than as an entirely separate issue currently.

But it's going to be a separate PR, right?

yes, i didn't want to make this already large PR even larger. I'll implement the cast kernels and so on as a follow-up PR

pitrou · 2024-09-05T10:01:26Z

Following the same pattern, this change means that using decimal(precision, scale) instead of the specific decimal32/decimal64/decimal128/decimal256 functions results in the following functionality

I'm afraid this may massively break user code. I would suggest another approach:

deprecate the decimal() factory while keeping its current behavior of always returning at least decimal128
introduce a new smallest_decimal() factory that is documented to return the smallest possible type, and explicitly makes no guarantees about the stability of the return type

cpp/src/arrow/type.cc

wgtmac · 2024-09-05T15:26:12Z

Following the same pattern, this change means that using decimal(precision, scale) instead of the specific decimal32/decimal64/decimal128/decimal256 functions results in the following functionality

I'm afraid this may massively break user code. I would suggest another approach:

deprecate the decimal() factory while keeping its current behavior of always returning at least decimal128

introduce a new smallest_decimal() factory that is documented to return the smallest possible type, and explicitly makes no guarantees about the stability of the return type

I just have the same concern. +1 on the proposed workaround.

zeroshade · 2024-09-05T17:51:37Z

@pitrou @bkietz @wgtmac I've updated this based on the suggestion, created a smallest_decimal function and added a deprecated message to the docstring for decimal.

Co-authored-by: Antoine Pitrou <[email protected]>

conbench-apache-arrow · 2024-10-01T04:52:09Z

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit d55d4c6.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 27 possible false positives for unstable benchmarks that are known to sometimes produce them.

pitrou · 2024-10-01T08:14:09Z

Hmm, did you notice the UBSAN failure in Decimal32Test.LeftShift?
https://github.com/apache/arrow/actions/runs/11115928849/job/30885255275#step:6:6273

(you can easily run this build locally using archery docker if you don't want to wait for CI every time :-))

### Rationale for this change 32 and 64 bit Decimal types were added in C++ in #43957 but haven't been implemented in R yet ### What changes are included in this PR? Implements them in R ### Are these changes tested? Yup ### Are there any user-facing changes? Yeah, new types but also the implicit downcasting so we should think about how to communicate this if at all * GitHub Issue: #46719 Authored-by: Nic Crane <[email protected]> Signed-off-by: Nic Crane <[email protected]>

### Rationale for this change 32 and 64 bit Decimal types were added in C++ in apache#43957 but haven't been implemented in R yet ### What changes are included in this PR? Implements them in R ### Are these changes tested? Yup ### Are there any user-facing changes? Yeah, new types but also the implicit downcasting so we should think about how to communicate this if at all * GitHub Issue: apache#46719 Authored-by: Nic Crane <[email protected]> Signed-off-by: Nic Crane <[email protected]>

zeroshade requested review from bkietz, felipecrv, joellubi, lidavidm and pitrou September 4, 2024 19:40

zeroshade requested review from westonpace and wgtmac as code owners September 4, 2024 19:40

github-actions bot added Component: Parquet Component: C++ awaiting committer review Awaiting committer review Component: Documentation Component: Python labels Sep 4, 2024

lidavidm reviewed Sep 5, 2024

View reviewed changes

github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Sep 5, 2024

wgtmac reviewed Sep 5, 2024

View reviewed changes

cpp/src/arrow/type.cc Outdated Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Sep 5, 2024

zeroshade and others added 8 commits September 30, 2024 11:27

simplify a bunch of tests with a generic typed_test

b44888d

use FromRealApprox

e2957a9

static_cast instead of implicit cast

980d6fb

remove special cases, adjust tests

2a3e5c4

Update cpp/src/arrow/util/decimal.cc

c723754

Co-authored-by: Antoine Pitrou <[email protected]>

more updates from comments

5382eb4

add reference to issue for decimal32 approx

9fda783

make RoundedRightShift a no-op

af8c722

zeroshade force-pushed the cpp-decimal32-64 branch from 154ea65 to af8c722 Compare September 30, 2024 15:27

github-actions bot removed Component: Java Component: C# labels Sep 30, 2024

zeroshade added 4 commits September 30, 2024 12:32

fix tests

48639e3

avoid ASAN issue

b110605

fix ubsan test

1d97e27

fix ubsan

39032f2

zeroshade merged commit d55d4c6 into apache:main Sep 30, 2024

zeroshade deleted the cpp-decimal32-64 branch September 30, 2024 21:15

kou mentioned this pull request Oct 1, 2024

[C++] Decimal32 support introduced an UBSAN error #44276

Closed

mapleFU mentioned this pull request Oct 9, 2024

[C++][Parquet] arrow Decimal32/Decimal64 write Parquet and testing #44345

Closed

ianmcook mentioned this pull request Nov 13, 2024

[Python] Add support for Decimal32 and Decimal64 #44713

Closed

This was referenced Jun 5, 2025

[Benchmarking][R] conbench is failing #46716

Open

[R] Add 32 and 64 bit Decimal types #46719

Closed

GH-46719: [R] Add 32 and 64 bit Decimal types #46720

Merged

This was referenced Jun 16, 2025

[C++] Test precision range for decimal data type in C++ #30667

Closed

[C++] Decimal64/32 support? #38622

Closed

[C++] Add support for Decimal16, Decimal32 and Decimal64 #25483

Closed

GH-43956: [C++][Format] Add initial Decimal32/Decimal64 implementations #43957

GH-43956: [C++][Format] Add initial Decimal32/Decimal64 implementations #43957

Uh oh!

Conversation

zeroshade commented Sep 4, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Sep 4, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lidavidm Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

zeroshade Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

pitrou Sep 16, 2024

Choose a reason for hiding this comment

Uh oh!

zeroshade Sep 16, 2024

Choose a reason for hiding this comment

Uh oh!

pitrou commented Sep 5, 2024

Uh oh!

Uh oh!

wgtmac commented Sep 5, 2024

Uh oh!

zeroshade commented Sep 5, 2024

Uh oh!

conbench-apache-arrow bot commented Oct 1, 2024

Uh oh!

pitrou commented Oct 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

zeroshade commented Sep 4, 2024 •

edited by github-actions bot

Loading