-
Notifications
You must be signed in to change notification settings - Fork 2.8k
API: required nested fields within optional structs can produce null #13804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@dejangvozdenac thanks for reporting and fixing the issue. I agree that can you add unit test in Spark module to cover the scenario you described? |
api/src/test/java/org/apache/iceberg/expressions/TestBoundReferenceProducesNull.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestBoundReferenceProducesNull.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestBoundReferenceProducesNull.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestBoundReferenceProducesNull.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/expressions/BoundReference.java
Outdated
Show resolved
Hide resolved
parquet/src/test/java/org/apache/iceberg/parquet/TestBloomRowGroupFilter.java
Outdated
Show resolved
Hide resolved
thanks for the review @stevenzwu, I appreciate it! I addressed all the comments, let me know if anything further is needed. |
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestFilterPushDown.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestBoundReference.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We will need 2 or 3 more committers' approval, since this modifies critical api code.
awesome, thanks for the quick review @stevenzwu. what's the usually process here? should I tag 2-3 people in who have reviewed code under api or do you have suggestions? |
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestFilterPushDown.java
Outdated
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java
Outdated
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java
Outdated
Show resolved
Hide resolved
@singhpk234 @pvary @nastra do you mind taking a look? I see you're familiar with this part of the code and @stevenzwu mentioned we need more reviews. |
api/src/test/java/org/apache/iceberg/expressions/TestBoundReference.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestBoundReference.java
Show resolved
Hide resolved
Co-authored-by: Eduard Tudenhoefner <[email protected]>
b489986
to
535c8a1
Compare
producesNull check is used when binding isNull and notNull predicate. Currently, the logic states that a field can produce null if and only if that field is optional. This is not true, however, in the case of required fields nested within optional structs. The field itself can produce nulls if the parent struct is null despite it being required.
I'm able to reproduce this case in Trino and Spark by creating the following schema and adding rows to it:
address.street is null for row 0, but trino/spark using iceberg api disagree:
You can see that when Trino reads the entire file, it correctly determines the row:
This also leads to unexpected behavior where null or not null check behaves differently based on the binding order:
After this change, iceberg can find the null row:
Closes #13328 (and relatedly trinodb/trino#20511)