-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Image transforms library #18520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image transforms library #18520
Changes from all commits
a94c537
932f291
54aed8b
ba55c89
4b430d4
b1c8b59
daf069a
95b4a6a
6f7ef56
b9ce4a0
f02ae6a
6b678fb
db93437
bd890d5
4b27a34
ff0d49e
2c2fa9a
b9f7837
346270d
7faf2e6
ccc15fb
c8f8eb6
90093f4
d89c051
9bc9157
6ec382a
56ee6ad
38ebb50
6b88d5f
67077f1
82712c7
71d666d
fb6438c
ffe71b6
cc480e8
b997a98
9106443
1b3cf65
2860460
3e1077b
4264d1a
fb5dcd6
43f561d
60c56e5
9294dbc
654cf93
1360732
936de65
627c048
1b64c80
88b82e9
3ea27aa
84fdd07
10d56b1
392e980
2117b94
68de952
5208680
a28ac88
2ead9e5
9514d54
8f63b76
46a9c74
082e4ff
bf73358
34b6b2f
7150293
8678c13
937884c
a1b681a
b1db434
f0c14ee
952c2a0
2f0fa0b
e6233cc
ddc8cf9
5407de6
f1cf228
bd0afd6
a6f69bc
a7af81f
ad58bd9
affb945
5891dd8
b66d0f6
8b73f89
ae6030c
78bdfb3
7a4d22a
994e040
42c23bd
05c65f6
feb9556
a30b007
fd7b6c7
790c2c6
2e929cf
ff04de3
cb1dcd8
ae35873
4fff267
62c6e55
271a09d
b7edea0
8cacf30
e385c5a
ed4b059
553be89
cfa16eb
7472b39
35a534a
2c96675
fc87969
e8f5772
707c0ff
0ff35a8
1d34656
a610155
2f493d5
80c33f8
c6e979f
2884397
e9ba674
dcb1685
7072f66
c5e228e
6952e9b
8a18ad9
e9c67f7
7aa5bfd
5606dba
5b29a58
defa14c
87271d1
a9b2968
24f688f
c465437
cafb76e
a25b1b3
fdd9c95
5e8a3d4
ba98271
3a70590
09f36ba
ca3833e
b84379c
8d7065e
c9c5420
5d39088
0544879
8b98733
c2fc948
fe29e4c
793d978
8aea331
db07c44
5a29d4f
ad4215f
a272ed0
6d8ab27
9d87c2d
1c38f1a
529ac2b
8a8a9a1
5a46799
5d1df72
f03866f
261f480
ff90f49
1e7062a
c472b59
8cd549f
b0dea99
b93957a
b881653
a9a0e18
089ad23
f1590b2
b2fe78b
c9d8c70
af92441
2287492
c97b085
ea2c992
771d6c0
b53ef28
1769f66
b2dc2f3
ab9d3b4
a316ea3
c6751ea
b7046bc
f8a6b87
b6fd4e3
a37bce3
6ec9dbb
7693600
464a4f2
713e958
6ec76ff
adc0f9d
df81b6a
48a07a1
8785229
d44fe63
83330ef
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| <!--Copyright 2022 The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
| --> | ||
|
|
||
| # Utilities for Image Processors | ||
|
|
||
| This page lists all the utility functions used by the image processors, mainly the functional | ||
| transformations used to process the images. | ||
|
|
||
| Most of those are only useful if you are studying the code of the image processors in the library. | ||
|
|
||
| ## Image Transformations | ||
|
|
||
| [[autodoc]] image_transforms.rescale | ||
|
|
||
| [[autodoc]] image_transforms.resize | ||
|
|
||
| [[autodoc]] image_transforms.to_pil_image | ||
|
|
||
| ## ImageProcessorMixin | ||
|
|
||
| [[autodoc]] image_processing_utils.ImageProcessorMixin |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| # coding=utf-8 | ||
| # Copyright 2022 The HuggingFace Inc. team. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from .feature_extraction_utils import BatchFeature as BaseBatchFeature | ||
| from .feature_extraction_utils import FeatureExtractionMixin | ||
| from .utils import logging | ||
|
|
||
|
|
||
| logger = logging.get_logger(__name__) | ||
|
|
||
|
|
||
| # TODO: Move BatchFeature to be imported by both feature_extraction_utils and image_processing_utils | ||
| # We override the class string here, but logic is the same. | ||
| class BatchFeature(BaseBatchFeature): | ||
| r""" | ||
| Holds the output of the image processor specific `__call__` methods. | ||
|
|
||
| This class is derived from a python dictionary and can be used as a dictionary. | ||
|
|
||
| Args: | ||
| data (`dict`): | ||
| Dictionary of lists/arrays/tensors returned by the __call__ method ('pixel_values', etc.). | ||
| tensor_type (`Union[None, str, TensorType]`, *optional*): | ||
| You can give a tensor_type here to convert the lists of integers in PyTorch/TensorFlow/Numpy Tensors at | ||
| initialization. | ||
| """ | ||
|
|
||
|
|
||
| # We use aliasing whilst we phase out the old API. Once feature extractors for vision models | ||
| # are deprecated, ImageProcessor mixin will be implemented. Any shared logic will be abstracted out. | ||
| ImageProcessorMixin = FeatureExtractionMixin | ||
|
|
||
|
|
||
| class BaseImageProcessor(ImageProcessorMixin): | ||
| def __init__(self, **kwargs): | ||
| super().__init__(**kwargs) | ||
|
|
||
| def __call__(self, images, **kwargs) -> BatchFeature: | ||
| return self.preprocess(images, **kwargs) | ||
|
|
||
| def preprocess(self, images, **kwargs) -> BatchFeature: | ||
| raise NotImplementedError("Each image processor must implement its own preprocess method") | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,259 @@ | ||
| # coding=utf-8 | ||
| # Copyright 2022 The HuggingFace Inc. team. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from typing import TYPE_CHECKING, List, Optional, Tuple, Union | ||
|
|
||
| import numpy as np | ||
|
|
||
| from transformers.utils.import_utils import is_flax_available, is_tf_available, is_torch_available, is_vision_available | ||
|
|
||
|
|
||
| if is_vision_available(): | ||
| import PIL | ||
|
|
||
| from .image_utils import ( | ||
| ChannelDimension, | ||
| get_image_size, | ||
| infer_channel_dimension_format, | ||
| is_jax_tensor, | ||
| is_tf_tensor, | ||
| is_torch_tensor, | ||
| ) | ||
|
|
||
|
|
||
| if TYPE_CHECKING: | ||
| if is_torch_available(): | ||
| import torch | ||
| if is_tf_available(): | ||
| import tensorflow as tf | ||
| if is_flax_available(): | ||
| import jax.numpy as jnp | ||
|
|
||
|
|
||
| def to_channel_dimension_format(image: np.ndarray, channel_dim: Union[ChannelDimension, str]) -> np.ndarray: | ||
| """ | ||
| Converts `image` to the channel dimension format specified by `channel_dim`. | ||
|
|
||
| Args: | ||
| image (`numpy.ndarray`): | ||
| The image to have its channel dimension set. | ||
| channel_dim (`ChannelDimension`): | ||
| The channel dimension format to use. | ||
|
|
||
| Returns: | ||
| `np.ndarray`: The image with the channel dimension set to `channel_dim`. | ||
| """ | ||
| if not isinstance(image, np.ndarray): | ||
| raise ValueError(f"Input image must be of type np.ndarray, got {type(image)}") | ||
|
|
||
| current_channel_dim = infer_channel_dimension_format(image) | ||
amyeroberts marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| target_channel_dim = ChannelDimension(channel_dim) | ||
| if current_channel_dim == target_channel_dim: | ||
| return image | ||
|
|
||
| if target_channel_dim == ChannelDimension.FIRST: | ||
| image = image.transpose((2, 0, 1)) | ||
| elif target_channel_dim == ChannelDimension.LAST: | ||
| image = image.transpose((1, 2, 0)) | ||
| else: | ||
| raise ValueError("Unsupported channel dimension format: {}".format(channel_dim)) | ||
|
|
||
| return image | ||
|
|
||
|
|
||
| def rescale( | ||
| image: np.ndarray, scale: float, data_format: Optional[ChannelDimension] = None, dtype=np.float32 | ||
| ) -> np.ndarray: | ||
| """ | ||
| Rescales `image` by `scale`. | ||
|
|
||
| Args: | ||
| image (`np.ndarray`): | ||
| The image to rescale. | ||
| scale (`float`): | ||
| The scale to use for rescaling the image. | ||
| data_format (`ChannelDimension`, *optional*): | ||
| The channel dimension format of the image. If not provided, it will be the same as the input image. | ||
| dtype (`np.dtype`, *optional*, defaults to `np.float32`): | ||
| The dtype of the output image. Defaults to `np.float32`. Used for backwards compatibility with feature | ||
| extractors. | ||
|
|
||
| Returns: | ||
| `np.ndarray`: The rescaled image. | ||
| """ | ||
| if not isinstance(image, np.ndarray): | ||
| raise ValueError(f"Input image must be of type np.ndarray, got {type(image)}") | ||
|
|
||
| rescaled_image = image * scale | ||
amyeroberts marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| if data_format is not None: | ||
| rescaled_image = to_channel_dimension_format(rescaled_image, data_format) | ||
| rescaled_image = rescaled_image.astype(dtype) | ||
| return rescaled_image | ||
|
|
||
|
|
||
| def to_pil_image( | ||
| image: Union[np.ndarray, PIL.Image.Image, "torch.Tensor", "tf.Tensor", "jnp.Tensor"], | ||
| do_rescale: Optional[bool] = None, | ||
| ) -> PIL.Image.Image: | ||
| """ | ||
| Converts `image` to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if | ||
| needed. | ||
|
|
||
| Args: | ||
| image (`PIL.Image.Image` or `numpy.ndarray` or `torch.Tensor` or `tf.Tensor`): | ||
| The image to convert to the `PIL.Image` format. | ||
| do_rescale (`bool`, *optional*): | ||
| Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default | ||
| to `True` if the image type is a floating type, `False` otherwise. | ||
|
|
||
| Returns: | ||
| `PIL.Image.Image`: The converted image. | ||
| """ | ||
| if isinstance(image, PIL.Image.Image): | ||
| return image | ||
|
|
||
| # Convert all tensors to numpy arrays before converting to PIL image | ||
| if is_torch_tensor(image) or is_tf_tensor(image): | ||
| image = image.numpy() | ||
| elif is_jax_tensor(image): | ||
| image = np.array(image) | ||
| elif not isinstance(image, np.ndarray): | ||
| raise ValueError("Input image type not supported: {}".format(type(image))) | ||
|
|
||
| # If the channel as been moved to first dim, we put it back at the end. | ||
| image = to_channel_dimension_format(image, ChannelDimension.LAST) | ||
|
|
||
| # PIL.Image can only store uint8 values, so we rescale the image to be between 0 and 255 if needed. | ||
| do_rescale = isinstance(image.flat[0], float) if do_rescale is None else do_rescale | ||
| if do_rescale: | ||
| image = rescale(image, 255) | ||
| image = image.astype(np.uint8) | ||
| return PIL.Image.fromarray(image) | ||
|
|
||
|
|
||
| def get_resize_output_image_size( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The draft PR looks good to me! I have a question though, do we need two methods for resizing ( The
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The two main reasons for having both
I don't feel very strongly about this and happy to move the finding shape logic to be internal to the method again. For
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where is this function used?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not yet because GLPN does its own logic to find the output image size based on |
||
| input_image: np.ndarray, | ||
| size: Union[int, Tuple[int, int], List[int], Tuple[int]], | ||
| default_to_square: bool = True, | ||
| max_size: Optional[int] = None, | ||
| ) -> tuple: | ||
| """ | ||
| Find the target (height, width) dimension of the output image after resizing given the input image and the desired | ||
| size. | ||
|
|
||
| Args: | ||
| input_image (`np.ndarray`): | ||
| The image to resize. | ||
| size (`int` or `Tuple[int, int]` or List[int] or Tuple[int]): | ||
| The size to use for resizing the image. If `size` is a sequence like (h, w), output size will be matched to | ||
| this. | ||
|
|
||
| If `size` is an int and `default_to_square` is `True`, then image will be resized to (size, size). If | ||
| `size` is an int and `default_to_square` is `False`, then smaller edge of the image will be matched to this | ||
| number. i.e, if height > width, then image will be rescaled to (size * height / width, size). | ||
| default_to_square (`bool`, *optional*, defaults to `True`): | ||
| How to convert `size` when it is a single int. If set to `True`, the `size` will be converted to a square | ||
| (`size`,`size`). If set to `False`, will replicate | ||
| [`torchvision.transforms.Resize`](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.Resize) | ||
| with support for resizing only the smallest edge and providing an optional `max_size`. | ||
| max_size (`int`, *optional*): | ||
| The maximum allowed for the longer edge of the resized image: if the longer edge of the image is greater | ||
| than `max_size` after being resized according to `size`, then the image is resized again so that the longer | ||
| edge is equal to `max_size`. As a result, `size` might be overruled, i.e the smaller edge may be shorter | ||
| than `size`. Only used if `default_to_square` is `False`. | ||
|
|
||
| Returns: | ||
| `tuple`: The target (height, width) dimension of the output image after resizing. | ||
| """ | ||
| if isinstance(size, (tuple, list)): | ||
| if len(size) == 2: | ||
| return tuple(size) | ||
| elif len(size) == 1: | ||
| # Perform same logic as if size was an int | ||
| size = size[0] | ||
| else: | ||
| raise ValueError("size must have 1 or 2 elements if it is a list or tuple") | ||
|
|
||
| if default_to_square: | ||
| return (size, size) | ||
|
|
||
| height, width = get_image_size(input_image) | ||
| short, long = (width, height) if width <= height else (height, width) | ||
| requested_new_short = size | ||
|
|
||
| if short == requested_new_short: | ||
| return (height, width) | ||
|
|
||
| new_short, new_long = requested_new_short, int(requested_new_short * long / short) | ||
|
|
||
| if max_size is not None: | ||
| if max_size <= requested_new_short: | ||
| raise ValueError( | ||
| f"max_size = {max_size} must be strictly greater than the requested " | ||
| f"size for the smaller edge size = {size}" | ||
| ) | ||
| if new_long > max_size: | ||
| new_short, new_long = int(max_size * new_short / new_long), max_size | ||
|
|
||
| return (new_long, new_short) if width <= height else (new_short, new_long) | ||
|
|
||
|
|
||
| def resize( | ||
| image, | ||
| size: Tuple[int, int], | ||
| resample=PIL.Image.BILINEAR, | ||
| data_format: Optional[ChannelDimension] = None, | ||
| return_numpy: bool = True, | ||
| ) -> np.ndarray: | ||
| """ | ||
| Resizes `image` to (h, w) specified by `size` using the PIL library. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given that we use Pillow for resizing, maybe it makes more sense to let
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, won't this method support the same behaviour as torchvision's resize? i.e. when size is an int, it only resizes the smaller edge of the image
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
As the input and output to the function are numpy arrays, I think that would be confusing. Ideally the user wouldn't have any knowledge about the library used to resize.
I moved the logic to find the dimensions of the output image into |
||
|
|
||
| Args: | ||
| image (`PIL.Image.Image` or `np.ndarray` or `torch.Tensor`): | ||
| The image to resize. | ||
| size (`Tuple[int, int]`): | ||
| The size to use for resizing the image. | ||
| resample (`int`, *optional*, defaults to `PIL.Image.BILINEAR`): | ||
| The filter to user for resampling. | ||
| data_format (`ChannelDimension`, *optional*): | ||
| The channel dimension format of the output image. If `None`, will use the inferred format from the input. | ||
| return_numpy (`bool`, *optional*, defaults to `True`): | ||
| Whether or not to return the resized image as a numpy array. If False a `PIL.Image.Image` object is | ||
| returned. | ||
|
|
||
| Returns: | ||
| `np.ndarray`: The resized image. | ||
| """ | ||
| if not len(size) == 2: | ||
| raise ValueError("size must have 2 elements") | ||
|
|
||
| # For all transformations, we want to keep the same data format as the input image unless otherwise specified. | ||
| # The resized image from PIL will always have channels last, so find the input format first. | ||
| data_format = infer_channel_dimension_format(image) if data_format is None else data_format | ||
|
|
||
| # To maintain backwards compatibility with the resizing done in previous image feature extractors, we use | ||
| # the pillow library to resize the image and then convert back to numpy | ||
| if not isinstance(image, PIL.Image.Image): | ||
| # PIL expects image to have channels last | ||
| image = to_channel_dimension_format(image, ChannelDimension.LAST) | ||
| image = to_pil_image(image) | ||
| height, width = size | ||
| # PIL images are in the format (width, height) | ||
| resized_image = image.resize((width, height), resample=resample) | ||
|
|
||
| if return_numpy: | ||
| resized_image = np.array(resized_image) | ||
| resized_image = to_channel_dimension_format(resized_image, data_format) | ||
| return resized_image | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, nice :) could possibly also include a
postprocessmethod, although I'm wondering whether that could be a method on the model (as it's oftentimes framework-specific)Definitely stuff for later :)
cc @alaradirik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't include a
postprocessmethod, as one model can have many different downstream tasks. I was thinking of using a structure similar to the current feature extractors where we havepostprocess_taskmethods. What do you think?I don't think the postprocessing methods should fall under the model if it's outside of what's needed for training. I wouldn't want to have to load a model in order to process outputs. Re framework specific, we'll have to think about how to handle this as we need to be able to support and have consistent outputs for the different framework implementations. If it's necessary to use specific libraries, we could have the image processor call the specific implementation e.g.
postprocess_instance_segmentationcalls_postprocess_instance_segmentation_pytorch.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts @NielsRogge I'm a bit late to the conversation but I agree that postprocessing methods shouldn't fall under the modeling files.
Adding
postprocess_taskmethods sounds good to me! We just need to ensure consistent inputs and outputs for these where applicable.