Skip to content

Conversation

@yutannihilation
Copy link
Contributor

The ability to preserve categorical values was introduced in #5077 as the convention of storing a special ARROW:schema key in the metadata. To invoke this, we need to call ArrowWriterProperties::store_schema().

The R binding is already ready for this, but calls store_schema() only conditionally and uses parquet___default_arrow_writer_properties() by default. Though I don't see the motivation to implement as such in #5451, considering the Python binding always calls store_schema(), I guess the R code can do the same.

@github-actions
Copy link

github-actions bot commented Jan 7, 2020

Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! LGTM

This makes me question whether default_arrow_writer_properties is right or useful, given that both R and Python are no longer using it. Should this store_schema behavior be pushed down to C++? @wesm

@yutannihilation
Copy link
Contributor Author

Thanks for merging!

@yutannihilation yutannihilation deleted the ARROW-7045_preserve_factor_in_parquet branch January 8, 2020 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants