Skip to content

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Apr 4, 2023

Rationale for this change

Parquet specs support storing key-value metadata provided by the user. However, the parquet-cpp writer can only set it via ParquetFileWriter::Open(). Sometimes user may want to add extra information to it while writing. So it is good to support adding extra key-value metadata any time before closing the file writer.

What changes are included in this PR?

Add a new interface void AddKeyValueMetadata(std::shared_ptr<const KeyValueMetadata> key_value_metadata) to the ParquetFileWriter class. User can now add more key-value metadata to the file if not closed.

Are these changes tested?

Added a new Metadata.TestAddKeyValueMetadata test to verify key-value metadata added before closing the writer are well preserved.

Are there any user-facing changes?

Yes, user can add custom key-value metadata whenever writer is not closed.

@wgtmac wgtmac requested a review from wjones127 as a code owner April 4, 2023 15:47
@github-actions
Copy link

github-actions bot commented Apr 4, 2023

@github-actions
Copy link

github-actions bot commented Apr 4, 2023

⚠️ GitHub issue #34888 has been automatically assigned in GitHub to PR creator.

@wgtmac
Copy link
Member Author

wgtmac commented Apr 7, 2023

@wjones127 @westonpace Please take a look when you have time. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Apr 7, 2023
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 9, 2023
@wgtmac wgtmac requested a review from wjones127 April 10, 2023 01:42
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Apr 10, 2023
@wgtmac
Copy link
Member Author

wgtmac commented Apr 10, 2023

Fixed it to be backward compatible. Please take a look again. Thanks! @wjones127

@wgtmac
Copy link
Member Author

wgtmac commented Apr 11, 2023

https://github.com/apache/arrow/actions/runs/4662936307/jobs/8253813543?pr=34889 is failing:

+ doxygen
/arrow/cpp/apidoc /
warning: Tag 'OUTPUT_TEXT_DIRECTION' at line 119 of file 'Doxyfile' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
warning: Tag 'COLS_IN_ALPHA_INDEX' at line 1118 of file 'Doxyfile' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
warning: Tag 'FORMULA_TRANSPARENT' at line 1557 of file 'Doxyfile' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
warning: Tag 'LATEX_SOURCE_CODE' at line 1865 of file 'Doxyfile' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
warning: Tag 'RTF_SOURCE_CODE' at line 1955 of file 'Doxyfile' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
warning: Tag 'DOCBOOK_PROGRAMLISTING' at line 2060 of file 'Doxyfile' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
warning: Tag 'CLASS_DIAGRAMS' at line 2257 of file 'Doxyfile' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
warning: Tag 'DOT_FONTNAME' at line 2299 of file 'Doxyfile' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
warning: Tag 'DOT_FONTSIZE' at line 2306 of file 'Doxyfile' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
warning: Tag 'DOT_TRANSPARENT' at line 2530 of file 'Doxyfile' has become obsolete.
         To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
/arrow/cpp/src/parquet/metadata.h:531: error: Found ';' while parsing initializer list! (doxygen could be confused by a macro call without semicolon) (warning treated as error, aborting now)
1
Error: `docker-compose --file /home/runner/work/arrow/arrow/docker-compose.yml run --rm conda-python-docs` exited with a non-zero exit code 1, see the process log above.

@wgtmac wgtmac requested a review from wjones127 April 11, 2023 06:05
Copy link
Member

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor comment, otherwise looks good!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Apr 11, 2023
class PARQUET_EXPORT FileMetaDataBuilder {
public:
// API convenience to get a MetaData reader
ARROW_DEPRECATED("Deprecated in 12.0.0. Use overload without KeyValueMetadata instead.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ARROW_DEPRECATED("Deprecated in 12.0.0. Use overload without KeyValueMetadata instead.")
ARROW_DEPRECATED("Deprecated in 13.0.0. Use overload without KeyValueMetadata instead.")

Should I change this since it may not be included in the 12.0.0 release? @wjones127

@wjones127 wjones127 merged commit 3bd57e3 into apache:main Apr 13, 2023
@wjones127
Copy link
Member

I merged this and then realized I have just missed the window for 12.0.0. @raulcd would it be possible to add this to the release branch? Otherwise I'll put up a PR to adjust the text of the deprecation warning added in this PR (which says a method was deprecated in 12.0.0).

@raulcd
Copy link
Member

raulcd commented Apr 13, 2023

@raulcd would it be possible to add this to the release branch?

I will add it to 12.0.0

@ursabot
Copy link

ursabot commented Apr 15, 2023

Benchmark runs are scheduled for baseline = 4963105 and contender = 3bd57e3. 3bd57e3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Finished ⬇️20.92% ⬆️1.53%] ursa-i9-9960x
[Finished ⬇️1.11% ⬆️0.12%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 3bd57e35 ec2-t3-xlarge-us-east-2
[Failed] 3bd57e35 test-mac-arm
[Finished] 3bd57e35 ursa-i9-9960x
[Finished] 3bd57e35 ursa-thinkcentre-m75q
[Finished] 49631057 ec2-t3-xlarge-us-east-2
[Failed] 49631057 test-mac-arm
[Finished] 49631057 ursa-i9-9960x
[Finished] 49631057 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented Apr 15, 2023

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

raulcd pushed a commit that referenced this pull request Apr 17, 2023
### Rationale for this change

Parquet specs support storing key-value metadata provided by the user. However, the parquet-cpp writer can only set it via ParquetFileWriter::Open(). Sometimes user may want to add extra information to it while writing. So it is good to support adding extra key-value metadata any time before closing the file writer.

### What changes are included in this PR?

Add a new interface `void AddKeyValueMetadata(std::shared_ptr<const KeyValueMetadata> key_value_metadata)` to the `ParquetFileWriter` class. User can now add more key-value metadata to the file if not closed.

### Are these changes tested?

Added a new `Metadata.TestAddKeyValueMetadata` test to verify key-value metadata added before closing the writer are well preserved.

### Are there any user-facing changes?

Yes, user can add custom key-value metadata whenever writer is not closed.
* Closes: #34888

Lead-authored-by: Gang Wu <[email protected]>
Co-authored-by: Will Jones <[email protected]>
Signed-off-by: Will Jones <[email protected]>
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
…pache#34889)

### Rationale for this change

Parquet specs support storing key-value metadata provided by the user. However, the parquet-cpp writer can only set it via ParquetFileWriter::Open(). Sometimes user may want to add extra information to it while writing. So it is good to support adding extra key-value metadata any time before closing the file writer.

### What changes are included in this PR?

Add a new interface `void AddKeyValueMetadata(std::shared_ptr<const KeyValueMetadata> key_value_metadata)` to the `ParquetFileWriter` class. User can now add more key-value metadata to the file if not closed.

### Are these changes tested?

Added a new `Metadata.TestAddKeyValueMetadata` test to verify key-value metadata added before closing the writer are well preserved.

### Are there any user-facing changes?

Yes, user can add custom key-value metadata whenever writer is not closed.
* Closes: apache#34888

Lead-authored-by: Gang Wu <[email protected]>
Co-authored-by: Will Jones <[email protected]>
Signed-off-by: Will Jones <[email protected]>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
…pache#34889)

### Rationale for this change

Parquet specs support storing key-value metadata provided by the user. However, the parquet-cpp writer can only set it via ParquetFileWriter::Open(). Sometimes user may want to add extra information to it while writing. So it is good to support adding extra key-value metadata any time before closing the file writer.

### What changes are included in this PR?

Add a new interface `void AddKeyValueMetadata(std::shared_ptr<const KeyValueMetadata> key_value_metadata)` to the `ParquetFileWriter` class. User can now add more key-value metadata to the file if not closed.

### Are these changes tested?

Added a new `Metadata.TestAddKeyValueMetadata` test to verify key-value metadata added before closing the writer are well preserved.

### Are there any user-facing changes?

Yes, user can add custom key-value metadata whenever writer is not closed.
* Closes: apache#34888

Lead-authored-by: Gang Wu <[email protected]>
Co-authored-by: Will Jones <[email protected]>
Signed-off-by: Will Jones <[email protected]>
mapleFU added a commit that referenced this pull request Jun 4, 2024
…nd PyArrow (#41633)

### Rationale for this change

The previous pr ( #34889 ) add a `AddKeyValueMetadata` to FileWriter. And now we should export it to Parquet Arrow and Python API.

### What changes are included in this PR?

1. Add `AddKeyValueMetadata` in parquet::arrow
2. Add `add_key_value_metadata` in pyarrow
3. testing

### Are these changes tested?

Yes

### Are there any user-facing changes?

New api allowing add key-value metadata to Parquet file

* GitHub Issue: #41608

Authored-by: mwish <[email protected]>
Signed-off-by: mwish <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Parquet] Writer cannot add extra key-value metadata once opened

5 participants