Skip to content

Deterministically normalize wheel ZIP metadata #578

@tabbyrobin

Description

@tabbyrobin

So as to enable reproducible builds, it would be nice if auditwheel could deterministically normalize the ZIP metadata in wheels it repairs.

It would be ideal if this was either done by default, or easily enabled with a simple option (for example, --deterministic or similar).

Sources of nondeterminism in ZIP metadata include:

  • Timestamps -- Of course, auditwheel already normalizes ZIP timestamps if SOURCE_DATE_EPOCH is set. However, it would be helpful to normalize the timestamps even if SOURCE_DATE_EPOCH is not set.
  • File permissions and ownership
  • Ordering of ZIP entries. -- This generally seems to already be effectively deterministic, at least with the handful of experiments I've run. I'm not sure what nondeterminism might exist depending on OS, OS variants, Python build backends, etc.
  • Potentially other things

I am filing this issue here with auditwheel in the hopes that a solution in auditwheel could serve as a blanket solution in a centralized tool, improving wheel reproducibility through the Python ecosystem.

See also:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions