Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Dec 11, 2025

Rationale for this change

# TODO: This will not handle prohibited characters in nested field names

This was first introduced in 4a6a6cb but did not implement the logic to handle nested types.

What changes are included in this PR?

This PR implements the nested handling for the sanitising the field names ParquetWriter (flavor='spark')

Are these changes tested?

Unittests were added, and manually tested via:

pytest python/pyarrow/tests/parquet/test_basic.py -k 'sanitized_spark' -v

Are there any user-facing changes?

Yes. It sanitizes the field names when ParquetWriter is used with flavor set to 'spark'.

@github-actions
Copy link

⚠️ GitHub issue #48455 has been automatically assigned in GitHub to PR creator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant