Iceberg writes to match file size


### Discussed in https://github.com/Eventual-Inc/Daft/discussions/3815

<div type='discussions-op-text'>

<sup>Originally posted by **gero90** February 14, 2025</sup>
If there is anyway to estimate parquet file size in `df.write_iceberg()` , it would be really nice to try to get parquet files of size close to the iceberg table property `write.target-file-size-bytes` (default is 512 MiB)

Having parquet files close to that size makes iceberg reads more efficient, and there is less table maintenance (compaction) to perform.

As example, I'm doing `df.into_partitions(1)` right before `df.write_iceberg()` where I know the total data is small, to get a single file per write.

Thanks in advance for taking a look and for making daft awesome!</div>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Iceberg writes to match file size #3823

Discussed in #3815

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Iceberg writes to match file size #3823

Description

Discussed in #3815

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions