Skip to content

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Jan 16, 2023

What changes are included in this PR?

  • Rename the function in the writer properties to enable_short_decimal_stored_as_integer.
  • Rephrase the comment to explain the behavior in detail.

@wgtmac
Copy link
Member Author

wgtmac commented Jan 16, 2023

@pitrou What about this?

@wjones127
Copy link
Member

Copy link
Member

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. I've added several suggestions :)

///
/// According to the specs, DECIMAL can be used to annotate the following types:
/// - int32: for 1 <= precision <= 9.
/// - int64: for 1 <= precision <= 18; precision < 10 will produce a warning.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"precision < 10 will produce a warning." is this true of our implementation? Maybe we should skip saying that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from the specs. In our impl no warning is produced because we have fixed physical types for decimals and guarantee that precision < 10 will not use int64 type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then perhaps we should change to this? So that we document our libraries behavior, and not just what the spec says?

Suggested change
/// - int64: for 1 <= precision <= 18; precision < 10 will produce a warning.
/// - int64: for 10 <= precision <= 18.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this is done, I think this is ready to be merged :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Please check again.

Comment on lines 474 to 476
/// As a consequence, decimal columns stored in integer types are more compact
/// but in a risk that the parquet file may not be readable by previous Arrow C++
/// versions or other implementations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// As a consequence, decimal columns stored in integer types are more compact
/// but in a risk that the parquet file may not be readable by previous Arrow C++
/// versions or other implementations.
/// As a consequence, decimal columns stored in integer types are more compact.

I'm not sure it's worth saying that older versions don't support them. In Arrow C++, read support for these was added at the same time for the other physical types for decimals (original pr). And that was a long time ago, before version 0.10.0. I'm sure it was similarly well supported in parquet-mr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @pitrou

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked that both apache/parquet-mr and apache/impala support reading this long ago. It seems that the native parquet reader in the velox does not handle this yet. I'm not familiar with other implementations but there may be some do not support it.

IMHO, we cannot take care of all other implementations. There is no risk to the parquet-cpp itself because the reader support is already in and it is well documented by the parquet specs. So I have removed the comment about the risk. @wjones127 @pitrou

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this, sounds fine to me.

@wgtmac wgtmac requested a review from wjones127 January 24, 2023 02:48
@wgtmac
Copy link
Member Author

wgtmac commented Jan 25, 2023

I found that this feature is already
included in the maint-11.0.0 branch. Should I cherry-pick this once merged? @wjones127

@wjones127
Copy link
Member

I found that this feature is already included in the maint-11.0.0 branch. Should I cherry-pick this once merged?

I'm afraid it might be too late for that, since the release vote has already passed. So this will have to be a minor breaking change in the 12.0.0 release.

@wjones127 wjones127 merged commit 068f499 into apache:master Jan 26, 2023
@ursabot
Copy link

ursabot commented Jan 27, 2023

Benchmark runs are scheduled for baseline = b7fd793 and contender = 068f499. 068f499 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.38% ⬆️0.1%] test-mac-arm
[Failed ⬇️0.46% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.12% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 068f4995 ec2-t3-xlarge-us-east-2
[Failed] 068f4995 test-mac-arm
[Failed] 068f4995 ursa-i9-9960x
[Finished] 068f4995 ursa-thinkcentre-m75q
[Finished] b7fd7939 ec2-t3-xlarge-us-east-2
[Failed] b7fd7939 test-mac-arm
[Failed] b7fd7939 ursa-i9-9960x
[Finished] b7fd7939 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants