Skip to content

Iceberg writes to match file size #3823

@kevinzwang

Description

@kevinzwang

Discussed in #3815

Originally posted by gero90 February 14, 2025
If there is anyway to estimate parquet file size in df.write_iceberg() , it would be really nice to try to get parquet files of size close to the iceberg table property write.target-file-size-bytes (default is 512 MiB)

Having parquet files close to that size makes iceberg reads more efficient, and there is less table maintenance (compaction) to perform.

As example, I'm doing df.into_partitions(1) right before df.write_iceberg() where I know the total data is small, to get a single file per write.

Thanks in advance for taking a look and for making daft awesome!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions