-
Notifications
You must be signed in to change notification settings - Fork 31.1k
Image transforms library #18520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image transforms library #18520
Conversation
…ormers into image-processor-mixin
| def __call__(self, images, **kwargs) -> BatchFeature: | ||
| return self.preprocess(images, **kwargs) | ||
|
|
||
| def preprocess(self, images, **kwargs) -> BatchFeature: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, nice :) could possibly also include a postprocess method, although I'm wondering whether that could be a method on the model (as it's oftentimes framework-specific)
Definitely stuff for later :)
cc @alaradirik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't include a postprocess method, as one model can have many different downstream tasks. I was thinking of using a structure similar to the current feature extractors where we have postprocess_task methods. What do you think?
I don't think the postprocessing methods should fall under the model if it's outside of what's needed for training. I wouldn't want to have to load a model in order to process outputs. Re framework specific, we'll have to think about how to handle this as we need to be able to support and have consistent outputs for the different framework implementations. If it's necessary to use specific libraries, we could have the image processor call the specific implementation e.g. postprocess_instance_segmentation calls _postprocess_instance_segmentation_pytorch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts @NielsRogge I'm a bit late to the conversation but I agree that postprocessing methods shouldn't fall under the modeling files.
Adding postprocess_task methods sounds good to me! We just need to ensure consistent inputs and outputs for these where applicable.
| return_numpy: bool = True, | ||
| ) -> np.ndarray: | ||
| """ | ||
| Resizes `image` to (h, w) specified by `size` using the PIL library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we use Pillow for resizing, maybe it makes more sense to let size be a tuple of (width, height) rather than the other way around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, won't this method support the same behaviour as torchvision's resize? i.e. when size is an int, it only resizes the smaller edge of the image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we use Pillow for resizing, maybe it makes more sense to let size be a tuple of (width, height) rather than the other way around?
As the input and output to the function are numpy arrays, I think that would be confusing. Ideally the user wouldn't have any knowledge about the library used to resize.
Also, won't this method support the same behaviour as torchvision's resize? i.e. when size is an int, it only resizes the smaller edge of the image
I moved the logic to find the dimensions of the output image into get_resize_output_image_size so that resize does just one thing. This is in part because there's a difference in behaviour between TF and PyTorch. Then each model's image processor can have its own logic within its resize method for finding the output shape, or use get_resize_output_image_size which has the same behaviour as before.
| return PIL.Image.fromarray(image) | ||
|
|
||
|
|
||
| def get_resize_output_image_size( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this function used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not yet because GLPN does its own logic to find the output image size based on size_divisor.
Co-authored-by: NielsRogge <[email protected]>
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of nits but looking great!
Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
@alaradirik @NielsRogge Could you (re-)review? |
alaradirik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me!
I just have a question regarding multi-modal models such as CLIP and OWL-ViT. These models have both feature extractors and processors, which call their respective tokenizer and feature extractor. Wouldn't creating XXModelProcessor aliases for their feature extractors create issues?
@alaradirik I believe this should be OK, as the feature extractors are being mapped to |
Use PIL.Image.XXX resampling values instead of PIL.Image.Resampling.XXX enum as it's only in the recent version >= 9.10 and version is not yet pinned and older version support deprecated
What does this PR do?
This is the first of a series of PRs to replace feature extractors with image processors for vision models.
Create a new module
image_transforms.pythat will contain functions for transforming images e.g.resize.The functions are designed to:
to_pil_image)Subsequent PRs:
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.