Skip to content

Conversation

1c3t3a
Copy link
Contributor

@1c3t3a 1c3t3a commented Aug 28, 2025

XMP is a common metadata format and multiple image formats supported by this crate are able to extract XMP metadata. Similiar to the icc profile and exif metadata, we extend the ImageDecoder trait to provide this functionality.

For now this is only implemented for png and webp and tiff.

This is related to #2568.

@1c3t3a 1c3t3a changed the title Provide a way to extract XMP metadata (png-only for now) Provide a way to extract XMP metadata (png & webp only for now) Aug 29, 2025
XMP is a common metadata format and multiple image formats supported
by this crate are able to extract XMP metadata. Similiar to the icc
profile and exif metadata, we extend the ImageDecoder trait to provide
this functionality.

For now this is only implemented for png.
@1c3t3a 1c3t3a force-pushed the xmp-metadata branch 3 times, most recently from 4cdedb2 to 53f7700 Compare September 1, 2025 10:27
@1c3t3a
Copy link
Contributor Author

1c3t3a commented Sep 1, 2025

Yehaa, I rebased after #2574, looks like this is ready to go from my side? Any further points to do @197g?

@1c3t3a 1c3t3a changed the title Provide a way to extract XMP metadata (png & webp only for now) Provide a way to extract XMP metadata (png & webp & tiff only for now) Sep 1, 2025
@1c3t3a
Copy link
Contributor Author

1c3t3a commented Sep 1, 2025

Sorry for the heck-meck, I decided to add Tiff now as well, since the roll pulled in the fix for processing the Bytes in Tiff :)

Copy link
Member

@197g 197g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually having more decoders support it, that set in particular is really core, makes the addition much more convincing as common behavior.

@197g 197g merged commit 9afe256 into image-rs:main Sep 1, 2025
32 checks passed
@Shnatsel
Copy link
Member

Shnatsel commented Sep 1, 2025

zune-jpeg crate also supports XMP, but only in the 0.5.x series which is yet to see a stable release, while image is still on 0.4.x.

It would be great to coordinate with the zune-jpeg author to clear the remaining issues and cut a release.

@fintelia
Copy link
Contributor

fintelia commented Sep 1, 2025

This PR unfortunately makes it harder to fix the memory limit handling for the PNG decoder because it requires PngDecoder::new to read all the text chunks into memory before we know what the memory limit should be. Not impossible to fix, but potentially quite tricky

@Shnatsel
Copy link
Member

Shnatsel commented Sep 1, 2025

Isn't that what we added the Seek bound for?

@1c3t3a
Copy link
Contributor Author

1c3t3a commented Sep 1, 2025

I am also happy to move the metadata extraction to the png crate if it fits better there. That way we could just peek the keyword and otherwise seek over the contents?

@197g
Copy link
Member

197g commented Sep 1, 2025

Isn't that what we added the Seek bound for?

Yes, since we've moved to ignore ancillary chunks that are broken we might also partially ignore them when they exceed memory limits. And by partially I mean keep a list of offsets where they occur so that their contents can be selectively retrieved. That should combine well with interfaces to read such chunks without fully allocating them in memory (also planned for chunks we want to ignore). We could add a flag to do so preemptively while keeping only a prefix in memory, enough to determine if they are relevant for XMP or Adobe embeds. I think there are a lot of potential variants that won't break functionality (with minor coordination to use those features in image as they are released maybe).

@fintelia
Copy link
Contributor

fintelia commented Sep 1, 2025

Thought about this a bit more, and I think there's a way we can implement this without needing invasive changes to the png crate chunk parsing state machine. The main point would be storing the positions of text chunks during initial parsing, and only seeking to each one and reading its label if the xmp_metadata method is called. libpng defaults to only storing 1000 text chunks, so we could similarly bound the worst-case behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants