Image transforms library #18520

amyeroberts · 2022-08-08T11:27:35Z

What does this PR do?

This is the first of a series of PRs to replace feature extractors with image processors for vision models.

Create a new module image_transforms.py that will contain functions for transforming images e.g. resize.

The functions are designed to:

Accept numpy arrays.
Return numpy arrays (except for e.g. to_pil_image)
Provide logic such that the new image processors produce the same outputs as feature extractors when called directly.

Subsequent PRs:

Image Processor Mixin: Image processor mixin amyeroberts/transformers#25
GLPNImageProcessor: Image processor glpn amyeroberts/transformers#23
GLPNFeatureExtractor -> GLPNImageProcessor alias GLPN - Rename featue extractor to image processor glpn amyeroberts/transformers#24

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

…ormers into image-processor-mixin

src/transformers/image_transforms.py

src/transformers/models/glpn/image_processing_glpn.py

src/transformers/image_processing_utils.py

NielsRogge · 2022-08-22T14:22:30Z

src/transformers/image_processing_utils.py

+    def __call__(self, images, **kwargs) -> BatchFeature:
+        return self.preprocess(images, **kwargs)
+
+    def preprocess(self, images, **kwargs) -> BatchFeature:


Ok, nice :) could possibly also include a postprocess method, although I'm wondering whether that could be a method on the model (as it's oftentimes framework-specific)

Definitely stuff for later :)

cc @alaradirik

I didn't include a postprocess method, as one model can have many different downstream tasks. I was thinking of using a structure similar to the current feature extractors where we have postprocess_task methods. What do you think?

I don't think the postprocessing methods should fall under the model if it's outside of what's needed for training. I wouldn't want to have to load a model in order to process outputs. Re framework specific, we'll have to think about how to handle this as we need to be able to support and have consistent outputs for the different framework implementations. If it's necessary to use specific libraries, we could have the image processor call the specific implementation e.g. postprocess_instance_segmentation calls _postprocess_instance_segmentation_pytorch.

@amyeroberts @NielsRogge I'm a bit late to the conversation but I agree that postprocessing methods shouldn't fall under the modeling files.

Adding postprocess_task methods sounds good to me! We just need to ensure consistent inputs and outputs for these where applicable.

src/transformers/image_transforms.py

NielsRogge · 2022-08-22T14:29:55Z

src/transformers/image_transforms.py

+    return_numpy: bool = True,
+) -> np.ndarray:
+    """
+    Resizes `image` to (h, w) specified by `size` using the PIL library.


Given that we use Pillow for resizing, maybe it makes more sense to let size be a tuple of (width, height) rather than the other way around?

Also, won't this method support the same behaviour as torchvision's resize? i.e. when size is an int, it only resizes the smaller edge of the image

Given that we use Pillow for resizing, maybe it makes more sense to let size be a tuple of (width, height) rather than the other way around?

As the input and output to the function are numpy arrays, I think that would be confusing. Ideally the user wouldn't have any knowledge about the library used to resize.

Also, won't this method support the same behaviour as torchvision's resize? i.e. when size is an int, it only resizes the smaller edge of the image

I moved the logic to find the dimensions of the output image into get_resize_output_image_size so that resize does just one thing. This is in part because there's a difference in behaviour between TF and PyTorch. Then each model's image processor can have its own logic within its resize method for finding the output shape, or use get_resize_output_image_size which has the same behaviour as before.

NielsRogge · 2022-08-22T14:31:58Z

src/transformers/image_transforms.py

+    return PIL.Image.fromarray(image)
+
+
+def get_resize_output_image_size(


Where is this function used?

It's not yet because GLPN does its own logic to find the output image size based on size_divisor.

Co-authored-by: NielsRogge <[email protected]>

sgugger

Left a couple of nits but looking great!

src/transformers/image_transforms.py

src/transformers/models/glpn/image_processing_glpn.py

Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

github-actions · 2022-09-27T15:03:25Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

amyeroberts · 2022-09-28T09:51:27Z

@alaradirik @NielsRogge Could you (re-)review?

alaradirik

All looks good to me!

I just have a question regarding multi-modal models such as CLIP and OWL-ViT. These models have both feature extractors and processors, which call their respective tokenizer and feature extractor. Wouldn't creating XXModelProcessor aliases for their feature extractors create issues?

amyeroberts · 2022-10-10T17:03:17Z

I just have a question regarding multi-modal models such as CLIP and OWL-ViT. These models have both feature extractors and processors, which call their respective tokenizer and feature extractor. Wouldn't creating XXModelProcessor aliases for their feature extractors create issues?

@alaradirik I believe this should be OK, as the feature extractors are being mapped to XxxImageProcessor rather than XxxProcessor, so there's no clash of names. Not sure if this answers your question or I've missed the consequence you're asking about.

Use PIL.Image.XXX resampling values instead of PIL.Image.Resampling.XXX enum as it's only in the recent version >= 9.10 and version is not yet pinned and older version support deprecated

amyeroberts added 30 commits July 27, 2022 10:37

Adapt FE methods to transforms library

a94c537

Mixin for saving the image processor

932f291

Base processor skeleton

54aed8b

BatchFeature for packaging image processor outputs

ba55c89

Initial image processor for GLPN

4b430d4

REmove accidental import

b1c8b59

Fixup and docs

daf069a

Mixin for saving the image processor

95b4a6a

Fixup and docs

6f7ef56

Import BatchFeature from feature_extraction_utils

b9ce4a0

Merge branch 'image-processor-mixin' of github.com:amyeroberts/transf…

f02ae6a

…ormers into image-processor-mixin

Fixup and docs

6b678fb

Fixup and docs

db93437

Fixup and docs

bd890d5

Fixup and docs

4b27a34

BatchFeature for packaging image processor outputs

ff0d49e

Import BatchFeature from feature_extraction_utils

2c2fa9a

Merge branch 'image-processor-mixin' into base-image-processor-class

b9f7837

Resolve conflicts

346270d

Import BatchFeature from feature_extraction_utils

7faf2e6

Fixup and docs

ccc15fb

Fixup and docs

c8f8eb6

BatchFeature for packaging image processor outputs

90093f4

Import BatchFeature from feature_extraction_utils

d89c051

Fixup and docs

9bc9157

Mixin for saving the image processor

6ec382a

Fixup and docs

56ee6ad

Merge branch 'image-batch-feature' into image-processor-glpn

38ebb50

Add rescale back and remove ImageType

6b88d5f

fix import mistake

67077f1