-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-386: [Java] Respect case of struct / map field names #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I expect this will also show up if added to the integration tests. @julienledem @jacques-n I'm not sure the timeline for using the Arrow jars in Drill, but I presume case-sensitivity is something you'd be able to handle without too much trouble on the application side? |
|
@alphalfalfa can you change the PR title to start with |
|
I think we need to support both behaviors. We have a bunch of code that supports case insensitivity that is built on this code. -1 until we come up with a better solution. Maybe @julienledem will have some ideas but he is out this week |
|
@jacques-n this is blocking use of Arrow on some internal use cases where we have case sensitive field names. We are also working on Arrow Java<->C++ in Spark -- I'm not sure if Spark SQL metadata is case sensitive or not. |
|
I'm not worried about the internal name change (bigInt > bigint). The key is the change to mapwriters. Can you add an option in MapWriters when you construct one to whether they are case sensitive or not? Keep the default behavior as it was but allow a fully case preserving/sensitive alternative? |
|
@jacques-n , without the name change of bigInt -> bigint, the test case of promotableWriter inside TestComplexWriter.java would fail. The problematic code block is the following: After BigIntWriter got promoted into UnionWriter, the internalMap of UnionVector already has an entry of "bigint". Somehow, an additional entry named "bigInt" is created. The subsequent operations are then messed up. For the default behavior, is that the best solution? Probably it is only me, but I never expect a data writer would silently lowering-case of field names provided by users. Of course, if there are there are already products based on the case insensitivity, it make sense to keep it as default. |
|
@wesm title changed, is this good? |
|
@alphalfalfa: I was saying that the bigint change seems correct. For defaults: we've built a bunch of code on top of this already so changing the default would be quite challenging. |
… as case-insensitive)
|
@jacques-n I've added the option and set default to be case-insensitive |
| @Override | ||
| public MapWriter map(String name) { | ||
| FieldWriter writer = fields.get(name.toLowerCase()); | ||
| FieldWriter writer = fields.get(handleCase(name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alphalfalfa he's referring to the changes in this function and below. I think Jacques is proposing having a CaseSensitiveMapWriter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wesm @jacques-n I am a bit confused here as i don't have the full picture of how these writer classes are used. What is the benefit of subclassing into CaseSensitiveMapWriter?
julienledem
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my comments on this.
| return this.caseSensitive? input : input.toLowerCase(); | ||
| } | ||
| public boolean getCaseSensitivity() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isCasSensitive()
| void clear(); | ||
| void copyReader(FieldReader reader); | ||
| MapWriter rootAsMap(); | ||
| MapWriter rootAsMap(Boolean caseSensitive); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ComplexWriter itself is case sensitive or not. And this method is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a side note this should be boolean not Boolean (but it does not apply since I'm suggesting to remove it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to configure the case sensitivity directly on ComplexWriter as well. But I have only found MapWriters lowering-case of field names which confused me a bit and make me thinking the case sensitivity probably only applies to the MapWriters.
|
|
||
| @Override | ||
| public MapWriter rootAsMap() { | ||
| public MapWriter rootAsMap(Boolean caseSensitive) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest that we add instead an optional caseSensitive parameter to the Constructor of ComplexWriterImpl
|
|
||
| @Override | ||
| public MapWriter rootAsMap() { | ||
| return rootAsMap(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/null/false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are more than two binary situations here, using Boolean and null is probably not the best solution. What would you suggest?
- init call, caseSensitive = null -> case insensitive
- init call, caseSensitive = true/false -> case sensitive/insensitive
- non-init call, caseSensitive = null -> already initialized, doing nothing
- non-init call, caseSensitive = true/false -> if not the same with initialized sensitivity, IllegalArgumentException is thrown, otherwise, doing nothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, i guess it doesn't matter if case sensitivity is configured when ComplexWriter is constructed.
|
thanks for the update @alphalfalfa. |
|
LGTM also |
|
The modifications don't address my concerns. Specifically, we want to avoid evaluating the boolean in map() every single time it is called. A common pattern when parsing json strings is calling this repeatedly. Let's just have two version of the implementation. In most cases, only one would get loaded so specialization could occur. |
|
thanks for clarifying @jacques-n.
|
|
@alphalfalfa Do you need support for nested maps? It does not look like you current PR passes down the caseSensitive attribute to nested maps. |
|
If condition in rootAsMap seems fine. Agreed that the whole tree should probably be case sensitive or not (not just a part of it). |
|
sure, I can make the change. @julienledem I probably don't need support for nested maps. Do you think I should pass down the attribute? If so, can you point me to the right portion of code? |
|
@alphalfalfa I think for correctness it is better to pass down the caseSensitivity attribute to nested maps. That means passing down that attribute to ComplexWriters returned by your writer:
|
|
PR updated. Hope I cover all the possible cases. |
julienledem
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
+1
@jacques-n ?
| public ListWriter list(String name) { | ||
| FieldWriter writer = fields.get(handleCase(name)); | ||
| String finalName = handleCase(name); | ||
| FieldWriter writer = fields.get(handleCase(finalName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you forgot to remove the call to handleCase here
| public ListWriter list(String name) { | ||
| FieldWriter writer = fields.get(name.toLowerCase()); | ||
| String finalName = handleCase(name); | ||
| FieldWriter writer = fields.get(handleCase(finalName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot to remove the call to handleCase here
|
@jacques-n @wesm more comments? |
|
LGTM +1 |
|
Thank you all and thanks @alphalfalfa for the careful revisions |
|
thanks @alphalfalfa ! We appreciate your contribution. |
…in /java (#42076) Bumps [commons-codec:commons-codec](https://github.com/apache/commons-codec) from 1.16.1 to 1.17.0. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/apache/commons-codec/blob/master/RELEASE-NOTES.txt">commons-codec:commons-codec's changelog</a>.</em></p> <blockquote> <h2>Apache Commons Codec 1.17.0 RELEASE NOTES</h2> <p>The Apache Commons Codec component contains encoders and decoders for various formats such as Base16, Base32, Base64, digest, and Hexadecimal. In addition to these widely used encoders and decoders, the codec package also maintains a collection of phonetic encoding utilities.</p> <p>Feature and fix release. Requires a minimum of Java 8.</p> <h2>New features</h2> <ul> <li> <pre><code> Add override org.apache.commons.codec.language.bm.Rule.PhonemeExpr.size(). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Add support for Base64 custom alphabets [#266](apache/commons-codec#266). Thanks to Chris Kocel, Gary Gregory. </code></pre> </li> <li> <pre><code> Add Base64.Builder (allows custom alphabets). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Add Base32.Builder (allows custom alphabets). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Add Base64 support for a custom padding byte (like Base32). Thanks to Gary Gregory. </code></pre> </li> </ul> <h2>Fixed Bugs</h2> <ul> <li>CODEC-320: Wrong output of DoubleMetaphone in 1.16.1. Thanks to Martin Frydl, Gary Gregory.</li> <li> <pre><code> Optimize memory allocation in PhoneticEngine. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> BCodec and QCodec encode() methods throw UnsupportedCharsetException instead of EncoderException. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Set Javadoc link to latest Java API LTS version. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Base32 constructor fails-fast with a NullPointerException if the custom alphabet array is null. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Base32 constructor makes a defensive copy of the line separator array. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Base64 constructor makes a defensive copy of the line separator array. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Base64 constructor makes a defensive copy of a custom alphabet array. Thanks to Gary Gregory. </code></pre> </li> </ul> <h2>Changes</h2> <ul> <li> <pre><code> Bump org.apache.commons:commons-parent from 66 to 69 [#250](apache/commons-codec#250), [#261](apache/commons-codec#261). Thanks to Dependabot, Gary Gregory. </code></pre> </li> <li> <pre><code> Bump commons-io:commons-io from 2.15.1 to 2.16.1 [#258](apache/commons-codec#258), [#265](apache/commons-codec#265). Thanks to Dependabot, Gary Gregory. </code></pre> </li> </ul> <p>For complete information on Apache Commons Codec, including instructions on how to submit bug reports, patches, or suggestions for improvement, see the Apache Commons Codec website:</p> <p><a href="https://commons.apache.org/proper/commons-codec/">https://commons.apache.org/proper/commons-codec/</a></p> <p>Download page: <a href="https://commons.apache.org/proper/commons-codec/download_codec.cgi">https://commons.apache.org/proper/commons-codec/download_codec.cgi</a></p> <hr /> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/apache/commons-codec/commit/5d809fe3d729bde9b507a51d2b2ed659da053692"><code>5d809fe</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-codec/commit/9a59c1c47b02ca795270b758c8d0591f5925b10f"><code>9a59c1c</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-codec/commit/5f0cfd46c89df69b579f37562ff1eded7ffd4b5c"><code>5f0cfd4</code></a> Longer lines</li> <li><a href="https://github.com/apache/commons-codec/commit/8714b5f62bb5fa5950aa5e8908bd0d8d3334dba5"><code>8714b5f</code></a> Remove dead comment</li> <li><a href="https://github.com/apache/commons-codec/commit/c56b95664913aab406f768c66f9264481b28c1bb"><code>c56b956</code></a> Bullet-proof internals</li> <li><a href="https://github.com/apache/commons-codec/commit/d2215d5dec3031f819c3bb514587d92a6aec8eff"><code>d2215d5</code></a> Base32 constructor fails-fast with a NullPointerException if the custom</li> <li><a href="https://github.com/apache/commons-codec/commit/fcc70e6fa1271158dd8f3a90350fa2589713f257"><code>fcc70e6</code></a> Base32 constructor makes a defensive copy of the line separator</li> <li><a href="https://github.com/apache/commons-codec/commit/ebe805a2730ad38886f9f04bd4d242e0a8c9caaa"><code>ebe805a</code></a> Base64 constructor makes a defensive copy of a custom alphabet array</li> <li><a href="https://github.com/apache/commons-codec/commit/55043334240eb2a1838e37ea1c8a6e434d328fdf"><code>5504333</code></a> Better exception message</li> <li><a href="https://github.com/apache/commons-codec/commit/c6c5f11eae145d8e8c655e622f0fc5dd74e6db2a"><code>c6c5f11</code></a> Base64 constructor makes a better defensive copy of the line separator</li> <li>Additional commits viewable in <a href="https://github.com/apache/commons-codec/compare/rel/commons-codec-1.16.1...rel/commons-codec-1.17.0">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@ dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@ dependabot rebase` will rebase this PR - `@ dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@ dependabot merge` will merge this PR after your CI passes on it - `@ dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@ dependabot cancel merge` will cancel a previously requested merge and block automerging - `@ dependabot reopen` will reopen this PR if it is closed - `@ dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@ dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@ dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@ dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@ dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Sutou Kouhei <[email protected]>
Changes include: - Remove all toLowerCase() calls on field names in MapWriters.java template file, so that the writers can respect case of the field names. - Use lower-case keys for internalMap in UnionVector instead of camel-case (e.g. bigInt -> bigint). p.s. I don't know what is the original purpose of using camel case here. It did not conflict because all field names are converted to lower cases in the past. - Add a simple test case of MapWriter with mixed-case field names. Author: Jingyuan Wang <[email protected]> Closes apache#261 from alphalfalfa/arrow-386 and squashes the following commits: cd08145 [Jingyuan Wang] Remove unnecessary handleCase() call 7b28bfc [Jingyuan Wang] Pass caseSensitive Attribute down to nested MapWriters 2fe7bcf [Jingyuan Wang] Separate MapWriters with CaseSensitiveMapWriters d269e21 [Jingyuan Wang] Configure case sensitivity when constructing ComplexWriterImpl cba60d1 [Jingyuan Wang] Add option to MapWriters to configure the case sensitivity (defaulted as case-insensitive) 51da2a1 [Jingyuan Wang] Arrow-386: [Java] Respect case of struct / map field names
…in /java (apache#42076) Bumps [commons-codec:commons-codec](https://github.com/apache/commons-codec) from 1.16.1 to 1.17.0. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/apache/commons-codec/blob/master/RELEASE-NOTES.txt">commons-codec:commons-codec's changelog</a>.</em></p> <blockquote> <h2>Apache Commons Codec 1.17.0 RELEASE NOTES</h2> <p>The Apache Commons Codec component contains encoders and decoders for various formats such as Base16, Base32, Base64, digest, and Hexadecimal. In addition to these widely used encoders and decoders, the codec package also maintains a collection of phonetic encoding utilities.</p> <p>Feature and fix release. Requires a minimum of Java 8.</p> <h2>New features</h2> <ul> <li> <pre><code> Add override org.apache.commons.codec.language.bm.Rule.PhonemeExpr.size(). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Add support for Base64 custom alphabets [apache#266](apache/commons-codec#266). Thanks to Chris Kocel, Gary Gregory. </code></pre> </li> <li> <pre><code> Add Base64.Builder (allows custom alphabets). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Add Base32.Builder (allows custom alphabets). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Add Base64 support for a custom padding byte (like Base32). Thanks to Gary Gregory. </code></pre> </li> </ul> <h2>Fixed Bugs</h2> <ul> <li>CODEC-320: Wrong output of DoubleMetaphone in 1.16.1. Thanks to Martin Frydl, Gary Gregory.</li> <li> <pre><code> Optimize memory allocation in PhoneticEngine. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> BCodec and QCodec encode() methods throw UnsupportedCharsetException instead of EncoderException. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Set Javadoc link to latest Java API LTS version. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Base32 constructor fails-fast with a NullPointerException if the custom alphabet array is null. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Base32 constructor makes a defensive copy of the line separator array. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Base64 constructor makes a defensive copy of the line separator array. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Base64 constructor makes a defensive copy of a custom alphabet array. Thanks to Gary Gregory. </code></pre> </li> </ul> <h2>Changes</h2> <ul> <li> <pre><code> Bump org.apache.commons:commons-parent from 66 to 69 [apache#250](apache/commons-codec#250), [apache#261](apache/commons-codec#261). Thanks to Dependabot, Gary Gregory. </code></pre> </li> <li> <pre><code> Bump commons-io:commons-io from 2.15.1 to 2.16.1 [apache#258](apache/commons-codec#258), [apache#265](apache/commons-codec#265). Thanks to Dependabot, Gary Gregory. </code></pre> </li> </ul> <p>For complete information on Apache Commons Codec, including instructions on how to submit bug reports, patches, or suggestions for improvement, see the Apache Commons Codec website:</p> <p><a href="https://commons.apache.org/proper/commons-codec/">https://commons.apache.org/proper/commons-codec/</a></p> <p>Download page: <a href="https://commons.apache.org/proper/commons-codec/download_codec.cgi">https://commons.apache.org/proper/commons-codec/download_codec.cgi</a></p> <hr /> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/apache/commons-codec/commit/5d809fe3d729bde9b507a51d2b2ed659da053692"><code>5d809fe</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-codec/commit/9a59c1c47b02ca795270b758c8d0591f5925b10f"><code>9a59c1c</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-codec/commit/5f0cfd46c89df69b579f37562ff1eded7ffd4b5c"><code>5f0cfd4</code></a> Longer lines</li> <li><a href="https://github.com/apache/commons-codec/commit/8714b5f62bb5fa5950aa5e8908bd0d8d3334dba5"><code>8714b5f</code></a> Remove dead comment</li> <li><a href="https://github.com/apache/commons-codec/commit/c56b95664913aab406f768c66f9264481b28c1bb"><code>c56b956</code></a> Bullet-proof internals</li> <li><a href="https://github.com/apache/commons-codec/commit/d2215d5dec3031f819c3bb514587d92a6aec8eff"><code>d2215d5</code></a> Base32 constructor fails-fast with a NullPointerException if the custom</li> <li><a href="https://github.com/apache/commons-codec/commit/fcc70e6fa1271158dd8f3a90350fa2589713f257"><code>fcc70e6</code></a> Base32 constructor makes a defensive copy of the line separator</li> <li><a href="https://github.com/apache/commons-codec/commit/ebe805a2730ad38886f9f04bd4d242e0a8c9caaa"><code>ebe805a</code></a> Base64 constructor makes a defensive copy of a custom alphabet array</li> <li><a href="https://github.com/apache/commons-codec/commit/55043334240eb2a1838e37ea1c8a6e434d328fdf"><code>5504333</code></a> Better exception message</li> <li><a href="https://github.com/apache/commons-codec/commit/c6c5f11eae145d8e8c655e622f0fc5dd74e6db2a"><code>c6c5f11</code></a> Base64 constructor makes a better defensive copy of the line separator</li> <li>Additional commits viewable in <a href="https://github.com/apache/commons-codec/compare/rel/commons-codec-1.16.1...rel/commons-codec-1.17.0">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@ dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@ dependabot rebase` will rebase this PR - `@ dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@ dependabot merge` will merge this PR after your CI passes on it - `@ dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@ dependabot cancel merge` will cancel a previously requested merge and block automerging - `@ dependabot reopen` will reopen this PR if it is closed - `@ dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@ dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@ dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@ dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@ dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Sutou Kouhei <[email protected]>
Changes include: