-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-34888: [C++][Parquet] Writer supports adding extra kv meta #34889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
1748c1f to
c5b322d
Compare
|
@wjones127 @westonpace Please take a look when you have time. Thanks! |
cpp/src/parquet/metadata.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
mapleFU
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
|
Fixed it to be backward compatible. Please take a look again. Thanks! @wjones127 |
|
https://github.com/apache/arrow/actions/runs/4662936307/jobs/8253813543?pr=34889 is failing: |
wjones127
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor comment, otherwise looks good!
Co-authored-by: Will Jones <[email protected]>
| class PARQUET_EXPORT FileMetaDataBuilder { | ||
| public: | ||
| // API convenience to get a MetaData reader | ||
| ARROW_DEPRECATED("Deprecated in 12.0.0. Use overload without KeyValueMetadata instead.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ARROW_DEPRECATED("Deprecated in 12.0.0. Use overload without KeyValueMetadata instead.") | |
| ARROW_DEPRECATED("Deprecated in 13.0.0. Use overload without KeyValueMetadata instead.") |
Should I change this since it may not be included in the 12.0.0 release? @wjones127
|
I merged this and then realized I have just missed the window for 12.0.0. @raulcd would it be possible to add this to the release branch? Otherwise I'll put up a PR to adjust the text of the deprecation warning added in this PR (which says a method was deprecated in 12.0.0). |
I will add it to 12.0.0 |
|
Benchmark runs are scheduled for baseline = 4963105 and contender = 3bd57e3. 3bd57e3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
['Python', 'R'] benchmarks have high level of regressions. |
### Rationale for this change Parquet specs support storing key-value metadata provided by the user. However, the parquet-cpp writer can only set it via ParquetFileWriter::Open(). Sometimes user may want to add extra information to it while writing. So it is good to support adding extra key-value metadata any time before closing the file writer. ### What changes are included in this PR? Add a new interface `void AddKeyValueMetadata(std::shared_ptr<const KeyValueMetadata> key_value_metadata)` to the `ParquetFileWriter` class. User can now add more key-value metadata to the file if not closed. ### Are these changes tested? Added a new `Metadata.TestAddKeyValueMetadata` test to verify key-value metadata added before closing the writer are well preserved. ### Are there any user-facing changes? Yes, user can add custom key-value metadata whenever writer is not closed. * Closes: #34888 Lead-authored-by: Gang Wu <[email protected]> Co-authored-by: Will Jones <[email protected]> Signed-off-by: Will Jones <[email protected]>
…pache#34889) ### Rationale for this change Parquet specs support storing key-value metadata provided by the user. However, the parquet-cpp writer can only set it via ParquetFileWriter::Open(). Sometimes user may want to add extra information to it while writing. So it is good to support adding extra key-value metadata any time before closing the file writer. ### What changes are included in this PR? Add a new interface `void AddKeyValueMetadata(std::shared_ptr<const KeyValueMetadata> key_value_metadata)` to the `ParquetFileWriter` class. User can now add more key-value metadata to the file if not closed. ### Are these changes tested? Added a new `Metadata.TestAddKeyValueMetadata` test to verify key-value metadata added before closing the writer are well preserved. ### Are there any user-facing changes? Yes, user can add custom key-value metadata whenever writer is not closed. * Closes: apache#34888 Lead-authored-by: Gang Wu <[email protected]> Co-authored-by: Will Jones <[email protected]> Signed-off-by: Will Jones <[email protected]>
…pache#34889) ### Rationale for this change Parquet specs support storing key-value metadata provided by the user. However, the parquet-cpp writer can only set it via ParquetFileWriter::Open(). Sometimes user may want to add extra information to it while writing. So it is good to support adding extra key-value metadata any time before closing the file writer. ### What changes are included in this PR? Add a new interface `void AddKeyValueMetadata(std::shared_ptr<const KeyValueMetadata> key_value_metadata)` to the `ParquetFileWriter` class. User can now add more key-value metadata to the file if not closed. ### Are these changes tested? Added a new `Metadata.TestAddKeyValueMetadata` test to verify key-value metadata added before closing the writer are well preserved. ### Are there any user-facing changes? Yes, user can add custom key-value metadata whenever writer is not closed. * Closes: apache#34888 Lead-authored-by: Gang Wu <[email protected]> Co-authored-by: Will Jones <[email protected]> Signed-off-by: Will Jones <[email protected]>
…nd PyArrow (#41633) ### Rationale for this change The previous pr ( #34889 ) add a `AddKeyValueMetadata` to FileWriter. And now we should export it to Parquet Arrow and Python API. ### What changes are included in this PR? 1. Add `AddKeyValueMetadata` in parquet::arrow 2. Add `add_key_value_metadata` in pyarrow 3. testing ### Are these changes tested? Yes ### Are there any user-facing changes? New api allowing add key-value metadata to Parquet file * GitHub Issue: #41608 Authored-by: mwish <[email protected]> Signed-off-by: mwish <[email protected]>
Rationale for this change
Parquet specs support storing key-value metadata provided by the user. However, the parquet-cpp writer can only set it via ParquetFileWriter::Open(). Sometimes user may want to add extra information to it while writing. So it is good to support adding extra key-value metadata any time before closing the file writer.
What changes are included in this PR?
Add a new interface
void AddKeyValueMetadata(std::shared_ptr<const KeyValueMetadata> key_value_metadata)to theParquetFileWriterclass. User can now add more key-value metadata to the file if not closed.Are these changes tested?
Added a new
Metadata.TestAddKeyValueMetadatatest to verify key-value metadata added before closing the writer are well preserved.Are there any user-facing changes?
Yes, user can add custom key-value metadata whenever writer is not closed.