Skip to content

Iceberg connector gives different results for nested null checks with projection pushdown enabled vs disabled #20511

@zpuller

Description

@zpuller

Iceberg connector gives different results for nested null checks with projection pushdown enabled vs disabled, specifically in the case of a schema with an optional struct containing a required inner field, eg.

message table {
  required int32 id = 1;
  required binary name (STRING) = 2;
  required int32 age = 3;
  optional group address = 4 {
    required binary street (STRING) = 5;
    optional group address_info = 6 {
      required binary city (STRING) = 7;
      required binary county (STRING) = 8;
      required binary state (STRING) = 9;
    }
  }
}

To reproduce, it is required to use something like Spark to write the table. We observed this with parquet format specifically. It cannot be created with Trino (to my knowledge) because SQL syntax does not permit specifying NOT NULL for nested types.

I created an iceberg table using the following spark sql query:

SELECT
  *
FROM VALUES
  (1, 'John Doe', 35, CAST(NULL AS STRUCT<street: STRING, address_info: STRUCT<city: STRING, county: STRING, state: TRING>>)),
  (2, 'Jane Doe', 27, named_struct(
    'street',
    CAST('456 Lane' AS STRING),
    'address_info',
    CAST(struct('San Francisco', 'San Francisco', 'CA') AS STRUCT<city: STRING, county: STRING, state: STRING>)
  )),
  (3, 'Mary Johnson', 30, named_struct(
    'street',
    CAST('789 Boulevard' AS STRING),
    'address_info',
    CAST(struct('Portland', 'Multnomah', 'OR') AS STRUCT<city: STRING, county: STRING, state: STRING>)
  )) AS t(id, name, age, address)

(I also had to manually tweak the schema to get the right set of optional and required fields)

then queried from Trino as follows:

set session iceberg.projection_pushdown_enabled=false
OR
set session iceberg.projection_pushdown_enabled=true

select
  id
from
  iceberg.schema.table
where
  address.street is null

This returns 1 row when pushdown is disabled, and 0 rows with pushdown enabled.

I verified this behavior on v437.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions