-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Feature Request / Improvement
Currently, Iceberg's FlinkSink does not provide a native, production-safe mechanism for passing application-level per-checkpoint metadata—such as min/max event time—into Iceberg snapshot properties during commit.
Example use case:
We want to track the max/min event time for rows written in each Flink checkpoint (i.e., each Iceberg snapshot) and include those values in the snapshot summary properties. This would enable downstream systems and data consumers to reliably understand the event-time boundaries for each snapshot.
{
// Example snapshot summary properties in Iceberg
// Statistics currently collected automatically by FlinkSink/Iceberg
"total-records": "575132000",
// Proposed requirement: application-supplied statistics emitted from Flink job
"max-event-time": "1763840004000",
// ...additional custom application-level metrics as needed
"number-customer-types": "3"
}
We expect there are many similar use cases where applications want to emit custom statistics or metadata for each committed snapshot, it's similar to what is currently recorded in CommitSummary, but as user-defined/application-defined key-value pairs about the same data.
We previously attempted to solve this in PR #14594, but realized that achieving production safety (across parallelism) requires an external store (such as Redis) to pass metadata from the application to the committer. Before exploring further, we would like to seek input from the community on a Flink-native solution or recommended direction for supporting this.
Query engine
Flink
Willingness to contribute
- I can contribute this improvement/feature independently
- I would be willing to contribute this improvement/feature with guidance from the Iceberg community
- I cannot contribute this improvement/feature at this time