-
Notifications
You must be signed in to change notification settings - Fork 4k
MINOR: [C++][Parquet] Rephrase decimal annotation #33694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@pitrou What about this? |
0c9e284 to
bc94680
Compare
bc94680 to
08f5d60
Compare
|
Breadcrumb: These changes are in response to comments: |
wjones127
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. I've added several suggestions :)
cpp/src/parquet/properties.h
Outdated
| /// | ||
| /// According to the specs, DECIMAL can be used to annotate the following types: | ||
| /// - int32: for 1 <= precision <= 9. | ||
| /// - int64: for 1 <= precision <= 18; precision < 10 will produce a warning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"precision < 10 will produce a warning." is this true of our implementation? Maybe we should skip saying that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from the specs. In our impl no warning is produced because we have fixed physical types for decimals and guarantee that precision < 10 will not use int64 type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then perhaps we should change to this? So that we document our libraries behavior, and not just what the spec says?
| /// - int64: for 1 <= precision <= 18; precision < 10 will produce a warning. | |
| /// - int64: for 10 <= precision <= 18. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this is done, I think this is ready to be merged :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Please check again.
cpp/src/parquet/properties.h
Outdated
| /// As a consequence, decimal columns stored in integer types are more compact | ||
| /// but in a risk that the parquet file may not be readable by previous Arrow C++ | ||
| /// versions or other implementations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// As a consequence, decimal columns stored in integer types are more compact | |
| /// but in a risk that the parquet file may not be readable by previous Arrow C++ | |
| /// versions or other implementations. | |
| /// As a consequence, decimal columns stored in integer types are more compact. |
I'm not sure it's worth saying that older versions don't support them. In Arrow C++, read support for these was added at the same time for the other physical types for decimals (original pr). And that was a long time ago, before version 0.10.0. I'm sure it was similarly well supported in parquet-mr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @pitrou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked that both apache/parquet-mr and apache/impala support reading this long ago. It seems that the native parquet reader in the velox does not handle this yet. I'm not familiar with other implementations but there may be some do not support it.
IMHO, we cannot take care of all other implementations. There is no risk to the parquet-cpp itself because the reader support is already in and it is well documented by the parquet specs. So I have removed the comment about the risk. @wjones127 @pitrou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this, sounds fine to me.
|
I found that this feature is already |
I'm afraid it might be too late for that, since the release vote has already passed. So this will have to be a minor breaking change in the 12.0.0 release. |
|
Benchmark runs are scheduled for baseline = b7fd793 and contender = 068f499. 068f499 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
What changes are included in this PR?
enable_short_decimal_stored_as_integer.