Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
206 commits
Select commit Hold shift + click to select a range
a94c537
Adapt FE methods to transforms library
amyeroberts Jul 27, 2022
932f291
Mixin for saving the image processor
amyeroberts Jul 27, 2022
54aed8b
Base processor skeleton
amyeroberts Jul 27, 2022
ba55c89
BatchFeature for packaging image processor outputs
amyeroberts Jul 27, 2022
4b430d4
Initial image processor for GLPN
amyeroberts Jul 27, 2022
b1c8b59
REmove accidental import
amyeroberts Jul 27, 2022
daf069a
Fixup and docs
amyeroberts Jul 28, 2022
95b4a6a
Mixin for saving the image processor
amyeroberts Jul 27, 2022
6f7ef56
Fixup and docs
amyeroberts Jul 28, 2022
b9ce4a0
Import BatchFeature from feature_extraction_utils
amyeroberts Jul 28, 2022
f02ae6a
Merge branch 'image-processor-mixin' of github.com:amyeroberts/transf…
amyeroberts Jul 28, 2022
6b678fb
Fixup and docs
amyeroberts Jul 28, 2022
db93437
Fixup and docs
amyeroberts Jul 28, 2022
bd890d5
Fixup and docs
amyeroberts Jul 28, 2022
4b27a34
Fixup and docs
amyeroberts Jul 28, 2022
ff0d49e
BatchFeature for packaging image processor outputs
amyeroberts Jul 27, 2022
2c2fa9a
Import BatchFeature from feature_extraction_utils
amyeroberts Jul 28, 2022
b9f7837
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Jul 28, 2022
346270d
Resolve conflicts
amyeroberts Jul 28, 2022
7faf2e6
Import BatchFeature from feature_extraction_utils
amyeroberts Jul 28, 2022
ccc15fb
Fixup and docs
amyeroberts Jul 28, 2022
c8f8eb6
Fixup and docs
amyeroberts Jul 28, 2022
90093f4
BatchFeature for packaging image processor outputs
amyeroberts Jul 27, 2022
d89c051
Import BatchFeature from feature_extraction_utils
amyeroberts Jul 28, 2022
9bc9157
Fixup and docs
amyeroberts Jul 28, 2022
6ec382a
Mixin for saving the image processor
amyeroberts Jul 27, 2022
56ee6ad
Fixup and docs
amyeroberts Jul 28, 2022
38ebb50
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Jul 28, 2022
6b88d5f
Add rescale back and remove ImageType
amyeroberts Jul 28, 2022
67077f1
fix import mistake
amyeroberts Jul 28, 2022
82712c7
Fix enum var reference
amyeroberts Jul 28, 2022
71d666d
Merge branch 'image-transforms-library' into image-processor-mixin
amyeroberts Jul 28, 2022
fb6438c
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Jul 28, 2022
ffe71b6
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts Jul 28, 2022
cc480e8
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Jul 28, 2022
b997a98
Can transform and specify image data format
amyeroberts Jul 28, 2022
9106443
Merge branch 'image-transforms-library' into image-processor-mixin
amyeroberts Jul 28, 2022
1b3cf65
Remove redundant function
amyeroberts Jul 28, 2022
2860460
Update reference
amyeroberts Jul 28, 2022
3e1077b
Merge branch 'image-transforms-library' into image-processor-mixin
amyeroberts Jul 28, 2022
4264d1a
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Jul 28, 2022
fb5dcd6
Merge in branch and remove conflicts
amyeroberts Jul 28, 2022
43f561d
Add in rescaling
amyeroberts Jul 29, 2022
60c56e5
Data format flag for rescale
amyeroberts Jul 29, 2022
9294dbc
Fix typo
amyeroberts Jul 29, 2022
654cf93
Fix dimension check
amyeroberts Jul 29, 2022
1360732
Merge branch 'image-transforms-library' into image-processor-mixin
amyeroberts Jul 29, 2022
936de65
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Jul 29, 2022
627c048
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts Jul 29, 2022
1b64c80
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Jul 29, 2022
88b82e9
Fixes to make IP and FE outputs match
amyeroberts Jul 29, 2022
3ea27aa
Add tests for transforms
amyeroberts Jul 29, 2022
84fdd07
Add test for utils
amyeroberts Jul 29, 2022
10d56b1
Merge branch 'image-transforms-library' into image-processor-mixin
amyeroberts Jul 29, 2022
392e980
Update some docstrings
amyeroberts Aug 2, 2022
2117b94
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Aug 2, 2022
68de952
Resole merge conflicts
amyeroberts Aug 2, 2022
5208680
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Aug 2, 2022
a28ac88
Make sure in channels last before converting to PIL
amyeroberts Aug 2, 2022
2ead9e5
Merge branch 'image-transforms-library' into image-processor-mixin
amyeroberts Aug 2, 2022
9514d54
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Aug 2, 2022
8f63b76
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts Aug 2, 2022
46a9c74
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Aug 2, 2022
082e4ff
Remove default to numpy batching
amyeroberts Aug 2, 2022
bf73358
Fix up
amyeroberts Aug 3, 2022
34b6b2f
Add docstring and model_input_types
amyeroberts Aug 4, 2022
7150293
Use feature processor config from hub
amyeroberts Aug 4, 2022
8678c13
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Aug 4, 2022
937884c
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts Aug 4, 2022
a1b681a
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Aug 4, 2022
b1db434
Alias GLPN feature extractor to image processor
amyeroberts Aug 4, 2022
f0c14ee
Alias feature extractor mixin
amyeroberts Aug 5, 2022
952c2a0
Resolve merge conflicts
amyeroberts Aug 5, 2022
2f0fa0b
Resolve merge conflicts
amyeroberts Aug 5, 2022
e6233cc
Resolve merge conflicts
amyeroberts Aug 5, 2022
ddc8cf9
Merge branch 'image-processor-glpn' into rename-fe-to-ip-glpn
amyeroberts Aug 5, 2022
5407de6
Merge in main
amyeroberts Aug 5, 2022
f1cf228
Merge branch 'image-transforms-library' into image-processor-mixin
amyeroberts Aug 5, 2022
bd0afd6
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Aug 5, 2022
a6f69bc
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts Aug 5, 2022
a7af81f
Merge and resolve conflicts
amyeroberts Aug 5, 2022
ad58bd9
Merge branch 'image-processor-glpn' into rename-fe-to-ip-glpn
amyeroberts Aug 5, 2022
affb945
Add return_numpy=False flag for resize
amyeroberts Aug 7, 2022
5891dd8
Merge branch 'image-transforms-library' into image-processor-mixin
amyeroberts Aug 7, 2022
b66d0f6
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Aug 7, 2022
8b73f89
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts Aug 7, 2022
ae6030c
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Aug 7, 2022
78bdfb3
Merge branch 'image-processor-glpn' into rename-fe-to-ip-glpn
amyeroberts Aug 7, 2022
7a4d22a
Fix up
amyeroberts Aug 8, 2022
994e040
Fix up
amyeroberts Aug 8, 2022
42c23bd
Use different frameworks safely
amyeroberts Aug 8, 2022
05c65f6
Safely import PIL
amyeroberts Aug 8, 2022
feb9556
Call function checking if PIL available
amyeroberts Aug 8, 2022
a30b007
Only import if vision available
amyeroberts Aug 8, 2022
fd7b6c7
Address Sylvain PR comments
amyeroberts Aug 9, 2022
790c2c6
Apply suggestions from code review
amyeroberts Aug 10, 2022
2e929cf
Update src/transformers/image_transforms.py
amyeroberts Aug 12, 2022
ff04de3
Update src/transformers/models/glpn/feature_extraction_glpn.py
amyeroberts Aug 12, 2022
cb1dcd8
Merge pull request #25 from amyeroberts/image-processor-mixin
amyeroberts Aug 16, 2022
ae35873
Add in docstrings
amyeroberts Aug 17, 2022
4fff267
Merge pull request #23 from amyeroberts/image-processor-glpn
amyeroberts Aug 17, 2022
62c6e55
Merge pull request #26 from amyeroberts/image-batch-feature
amyeroberts Aug 17, 2022
271a09d
Merge pull request #24 from amyeroberts/rename-fe-to-ip-glpn
amyeroberts Aug 17, 2022
b7edea0
Fix TFSwinSelfAttention to have relative position index as non-traina…
harrydrippin Aug 5, 2022
8cacf30
Refactor `TFSwinLayer` to increase serving compatibility (#18352)
harrydrippin Aug 5, 2022
e385c5a
Add TF prefix to TF-Res test class (#18481)
ydshieh Aug 5, 2022
ed4b059
Remove py.typed (#18485)
sgugger Aug 5, 2022
553be89
Fix pipeline tests (#18487)
sgugger Aug 5, 2022
cfa16eb
Use new huggingface_hub tools for download models (#18438)
sgugger Aug 5, 2022
7472b39
Fix `test_dbmdz_english` by updating expected values (#18482)
ydshieh Aug 5, 2022
35a534a
Move cache folder to huggingface/hub for consistency with hf_hub (#18…
sgugger Aug 5, 2022
2c96675
Update some expected values in `quicktour.mdx` for `resampy 0.3.0` (#…
ydshieh Aug 5, 2022
fc87969
Forgot one new_ for cache migration
sgugger Aug 5, 2022
e8f5772
disable Onnx test for google/long-t5-tglobal-base (#18454)
ydshieh Aug 5, 2022
707c0ff
Typo reported by Joel Grus on TWTR (#18493)
julien-c Aug 5, 2022
0ff35a8
Just re-reading the whole doc every couple of months 😬 (#18489)
julien-c Aug 6, 2022
1d34656
`transformers-cli login` => `huggingface-cli login` (#18490)
julien-c Aug 6, 2022
a610155
Add seed setting to image classification example (#18519)
regisss Aug 8, 2022
2f493d5
[DX fix] Fixing QA pipeline streaming a dataset. (#18516)
Narsil Aug 8, 2022
80c33f8
Clean up hub (#18497)
sgugger Aug 8, 2022
c6e979f
update fsdp docs (#18521)
pacman100 Aug 8, 2022
2884397
Fix compatibility with 1.12 (#17925)
sgugger Aug 8, 2022
e9ba674
Remove debug statement
sgugger Aug 8, 2022
dcb1685
Specify en in doc-builder README example (#18526)
ankrgyl Aug 8, 2022
7072f66
New cache fixes: add safeguard before looking in folders (#18522)
sgugger Aug 8, 2022
c5e228e
unpin resampy (#18527)
ydshieh Aug 8, 2022
6952e9b
✨ update to use interlibrary links instead of Markdown (#18500)
stevhliu Aug 8, 2022
8a18ad9
Add example of multimodal usage to pipeline tutorial (#18498)
stevhliu Aug 8, 2022
e9c67f7
[VideoMAE] Add model to doc tests (#18523)
NielsRogge Aug 8, 2022
7aa5bfd
Update perf_train_gpu_one.mdx (#18532)
mishig25 Aug 8, 2022
5606dba
Update no_trainer.py scripts to include accelerate gradient accumulat…
Rasmusafj Aug 8, 2022
5b29a58
Add Spanish translation of converting_tensorflow_models.mdx (#18512)
donelianc Aug 8, 2022
defa14c
Spanish translation of summarization.mdx (#15947) (#18477)
AguilaCudicio Aug 8, 2022
87271d1
Let's not cast them all (#18471)
younesbelkada Aug 8, 2022
a9b2968
fix: data2vec-vision Onnx ready-made configuration. (#18427)
NikeNano Aug 9, 2022
24f688f
Add mt5 onnx config (#18394)
chainyo Aug 9, 2022
c465437
Minor update of `run_call_with_unpacked_inputs` (#18541)
ydshieh Aug 9, 2022
cafb76e
BART - Fix attention mask device issue on copied models (#18540)
younesbelkada Aug 9, 2022
a25b1b3
Adding a new `align_to_words` param to qa pipeline. (#18010)
Narsil Aug 9, 2022
fdd9c95
📝 update metric with evaluate (#18535)
stevhliu Aug 9, 2022
5e8a3d4
Restore _init_weights value in no_init_weights (#18504)
YouJiacheng Aug 9, 2022
ba98271
Clean up comment
sgugger Aug 9, 2022
3a70590
📝 update documentation build section (#18548)
stevhliu Aug 9, 2022
09f36ba
`bitsandbytes` - `Linear8bitLt` integration into `transformers` model…
younesbelkada Aug 10, 2022
ca3833e
TF: XLA-trainable DeBERTa v2 (#18546)
gante Aug 10, 2022
b84379c
Preserve hub-related kwargs in AutoModel.from_pretrained (#18545)
sgugger Aug 10, 2022
8d7065e
TF Examples Rewrite (#18451)
Rocketknight1 Aug 10, 2022
c9c5420
Use commit hash to look in cache instead of calling head (#18534)
sgugger Aug 10, 2022
5d39088
`pipeline` support for `device="mps"` (or any other string) (#18494)
julien-c Aug 10, 2022
0544879
Update philosophy to include other preprocessing classes (#18550)
stevhliu Aug 10, 2022
8b98733
Properly move cache when it is not in default path (#18563)
sgugger Aug 10, 2022
c2fc948
Adds CLIP to models exportable with ONNX (#18515)
unography Aug 10, 2022
fe29e4c
raise atol for MT5OnnxConfig (#18560)
ydshieh Aug 10, 2022
793d978
fix string (#18568)
mrwyattii Aug 10, 2022
8aea331
Segformer TF: fix output size in documentation (#18572)
joihn Aug 11, 2022
db07c44
Fix resizing bug in OWL-ViT (#18573)
alaradirik Aug 11, 2022
5a29d4f
Fix LayoutLMv3 documentation (#17932)
pocca2048 Aug 11, 2022
ad4215f
Skip broken tests
sgugger Aug 11, 2022
a272ed0
Change BartLearnedPositionalEmbedding's forward method signature to s…
donebydan Aug 11, 2022
6d8ab27
german docs translation (#18544)
flozi00 Aug 11, 2022
9d87c2d
Deberta V2: Fix critical trace warnings to allow ONNX export (#18272)
iiLaurens Aug 11, 2022
1c38f1a
[FX] _generate_dummy_input supports audio-classification models for l…
michaelbenayoun Aug 11, 2022
529ac2b
Fix docstrings with last version of hf-doc-builder styler (#18581)
sgugger Aug 11, 2022
8a8a9a1
Bump nbconvert from 6.0.1 to 6.3.0 in /examples/research_projects/lxm…
dependabot[bot] Aug 11, 2022
5a46799
Bump nbconvert in /examples/research_projects/visual_bert (#18566)
dependabot[bot] Aug 11, 2022
5d1df72
fix owlvit tests, update docstring examples (#18586)
alaradirik Aug 11, 2022
f03866f
Return the permuted hidden states if return_dict=True (#18578)
amyeroberts Aug 11, 2022
261f480
Load sharded pt to flax (#18419)
ArthurZucker Aug 12, 2022
ff90f49
Add type hints for ViLT models (#18577)
donelianc Aug 12, 2022
1e7062a
update doc for perf_train_cpu_many, add intel mpi introduction (#18576)
sywangyi Aug 12, 2022
c472b59
typos (#18594)
stas00 Aug 12, 2022
8cd549f
FSDP bug fix for `load_state_dict` (#18596)
pacman100 Aug 12, 2022
b0dea99
Add `TFAutoModelForSemanticSegmentation` to the main `__init__.py` (#…
ydshieh Aug 12, 2022
b93957a
Generate: validate `model_kwargs` (and catch typos in generate argume…
gante Aug 12, 2022
b881653
Supporting seq2seq models for `bitsandbytes` integration (#18579)
younesbelkada Aug 12, 2022
a9a0e18
Add Donut (#18488)
NielsRogge Aug 12, 2022
089ad23
Fix URLs (#18604)
NielsRogge Aug 12, 2022
f1590b2
Update BLOOM parameter counts (#18531)
Muennighoff Aug 12, 2022
b2fe78b
[doc] fix anchors (#18591)
stas00 Aug 12, 2022
c9d8c70
[fsmt] deal with -100 indices in decoder ids (#18592)
stas00 Aug 12, 2022
af92441
small change (#18584)
younesbelkada Aug 12, 2022
2287492
Flax Remat for LongT5 (#17994)
KMFODA Aug 14, 2022
c97b085
mac m1 `mps` integration (#18598)
pacman100 Aug 16, 2022
ea2c992
Change scheduled CIs to use torch 1.12.1 (#18644)
ydshieh Aug 16, 2022
771d6c0
Add checks for some workflow jobs (#18583)
ydshieh Aug 16, 2022
b53ef28
TF: Fix generation repetition penalty with XLA (#18648)
gante Aug 16, 2022
1769f66
Update longt5.mdx (#18634)
flozi00 Aug 16, 2022
b2dc2f3
Update run_translation_no_trainer.py (#18637)
zhoutang776 Aug 16, 2022
ab9d3b4
[bnb] Minor modifications (#18631)
younesbelkada Aug 16, 2022
a316ea3
Examples: add Bloom support for token classification (#18632)
stefan-it Aug 17, 2022
c6751ea
Fix Yolos ONNX export test (#18606)
ydshieh Aug 17, 2022
b7046bc
Fixup
amyeroberts Aug 17, 2022
f8a6b87
Fix up
amyeroberts Aug 17, 2022
b6fd4e3
Resolve conflicts
amyeroberts Aug 17, 2022
a37bce3
Move PIL default arguments inside function for safe imports
amyeroberts Aug 17, 2022
6ec9dbb
Add image utils to toctree
amyeroberts Aug 17, 2022
7693600
Update `rescale` method to reflect changes in #18677
amyeroberts Aug 18, 2022
464a4f2
Update docs/source/en/internal/image_processing_utils.mdx
amyeroberts Aug 23, 2022
713e958
Address Niels PR comments
amyeroberts Aug 23, 2022
6ec76ff
Apply suggestions from code review - remove defaults to None
amyeroberts Sep 2, 2022
adc0f9d
Merge branch 'main' into image-transforms-library
amyeroberts Sep 28, 2022
df81b6a
Merge branch 'main' into image-transforms-library
amyeroberts Oct 12, 2022
48a07a1
Fix docstrings and revert to PIL.Image.XXX resampling
amyeroberts Oct 12, 2022
8785229
Some more docstrings and PIL.Image tidy up
amyeroberts Oct 12, 2022
d44fe63
Reorganise arguments so flags by modifiers
amyeroberts Oct 12, 2022
83330ef
Few last docstring fixes
amyeroberts Oct 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,8 @@
title: Utilities for Trainer
- local: internal/generation_utils
title: Utilities for Generation
- local: internal/image_processing_utils
title: Utilities for Image Processors
- local: internal/file_utils
title: General Utilities
title: Internal Helpers
Expand Down
30 changes: 30 additions & 0 deletions docs/source/en/internal/image_processing_utils.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Utilities for Image Processors

This page lists all the utility functions used by the image processors, mainly the functional
transformations used to process the images.

Most of those are only useful if you are studying the code of the image processors in the library.

## Image Transformations

[[autodoc]] image_transforms.rescale

[[autodoc]] image_transforms.resize

[[autodoc]] image_transforms.to_pil_image

## ImageProcessorMixin

[[autodoc]] image_processing_utils.ImageProcessorMixin
4 changes: 4 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -680,6 +680,8 @@
name for name in dir(dummy_vision_objects) if not name.startswith("_")
]
else:
_import_structure["image_processing_utils"] = ["ImageProcessorMixin"]
_import_structure["image_transforms"] = ["rescale", "resize", "to_pil_image"]
_import_structure["image_utils"] = ["ImageFeatureExtractionMixin"]
_import_structure["models.beit"].append("BeitFeatureExtractor")
_import_structure["models.clip"].append("CLIPFeatureExtractor")
Expand Down Expand Up @@ -3648,6 +3650,8 @@
except OptionalDependencyNotAvailable:
from .utils.dummy_vision_objects import *
else:
from .image_processing_utils import ImageProcessorMixin
from .image_transforms import rescale, resize, to_pil_image
from .image_utils import ImageFeatureExtractionMixin
from .models.beit import BeitFeatureExtractor
from .models.clip import CLIPFeatureExtractor
Expand Down
54 changes: 54 additions & 0 deletions src/transformers/image_processing_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# coding=utf-8
# Copyright 2022 The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from .feature_extraction_utils import BatchFeature as BaseBatchFeature
from .feature_extraction_utils import FeatureExtractionMixin
from .utils import logging


logger = logging.get_logger(__name__)


# TODO: Move BatchFeature to be imported by both feature_extraction_utils and image_processing_utils
# We override the class string here, but logic is the same.
class BatchFeature(BaseBatchFeature):
r"""
Holds the output of the image processor specific `__call__` methods.

This class is derived from a python dictionary and can be used as a dictionary.

Args:
data (`dict`):
Dictionary of lists/arrays/tensors returned by the __call__ method ('pixel_values', etc.).
tensor_type (`Union[None, str, TensorType]`, *optional*):
You can give a tensor_type here to convert the lists of integers in PyTorch/TensorFlow/Numpy Tensors at
initialization.
"""


# We use aliasing whilst we phase out the old API. Once feature extractors for vision models
# are deprecated, ImageProcessor mixin will be implemented. Any shared logic will be abstracted out.
ImageProcessorMixin = FeatureExtractionMixin


class BaseImageProcessor(ImageProcessorMixin):
def __init__(self, **kwargs):
super().__init__(**kwargs)

def __call__(self, images, **kwargs) -> BatchFeature:
return self.preprocess(images, **kwargs)

def preprocess(self, images, **kwargs) -> BatchFeature:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, nice :) could possibly also include a postprocess method, although I'm wondering whether that could be a method on the model (as it's oftentimes framework-specific)

Definitely stuff for later :)

cc @alaradirik

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't include a postprocess method, as one model can have many different downstream tasks. I was thinking of using a structure similar to the current feature extractors where we have postprocess_task methods. What do you think?

I don't think the postprocessing methods should fall under the model if it's outside of what's needed for training. I wouldn't want to have to load a model in order to process outputs. Re framework specific, we'll have to think about how to handle this as we need to be able to support and have consistent outputs for the different framework implementations. If it's necessary to use specific libraries, we could have the image processor call the specific implementation e.g. postprocess_instance_segmentation calls _postprocess_instance_segmentation_pytorch.

Copy link
Contributor

@alaradirik alaradirik Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amyeroberts @NielsRogge I'm a bit late to the conversation but I agree that postprocessing methods shouldn't fall under the modeling files.

Adding postprocess_task methods sounds good to me! We just need to ensure consistent inputs and outputs for these where applicable.

raise NotImplementedError("Each image processor must implement its own preprocess method")
259 changes: 259 additions & 0 deletions src/transformers/image_transforms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
# coding=utf-8
# Copyright 2022 The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from typing import TYPE_CHECKING, List, Optional, Tuple, Union

import numpy as np

from transformers.utils.import_utils import is_flax_available, is_tf_available, is_torch_available, is_vision_available


if is_vision_available():
import PIL

from .image_utils import (
ChannelDimension,
get_image_size,
infer_channel_dimension_format,
is_jax_tensor,
is_tf_tensor,
is_torch_tensor,
)


if TYPE_CHECKING:
if is_torch_available():
import torch
if is_tf_available():
import tensorflow as tf
if is_flax_available():
import jax.numpy as jnp


def to_channel_dimension_format(image: np.ndarray, channel_dim: Union[ChannelDimension, str]) -> np.ndarray:
"""
Converts `image` to the channel dimension format specified by `channel_dim`.

Args:
image (`numpy.ndarray`):
The image to have its channel dimension set.
channel_dim (`ChannelDimension`):
The channel dimension format to use.

Returns:
`np.ndarray`: The image with the channel dimension set to `channel_dim`.
"""
if not isinstance(image, np.ndarray):
raise ValueError(f"Input image must be of type np.ndarray, got {type(image)}")

current_channel_dim = infer_channel_dimension_format(image)
target_channel_dim = ChannelDimension(channel_dim)
if current_channel_dim == target_channel_dim:
return image

if target_channel_dim == ChannelDimension.FIRST:
image = image.transpose((2, 0, 1))
elif target_channel_dim == ChannelDimension.LAST:
image = image.transpose((1, 2, 0))
else:
raise ValueError("Unsupported channel dimension format: {}".format(channel_dim))

return image


def rescale(
image: np.ndarray, scale: float, data_format: Optional[ChannelDimension] = None, dtype=np.float32
) -> np.ndarray:
"""
Rescales `image` by `scale`.

Args:
image (`np.ndarray`):
The image to rescale.
scale (`float`):
The scale to use for rescaling the image.
data_format (`ChannelDimension`, *optional*):
The channel dimension format of the image. If not provided, it will be the same as the input image.
dtype (`np.dtype`, *optional*, defaults to `np.float32`):
The dtype of the output image. Defaults to `np.float32`. Used for backwards compatibility with feature
extractors.

Returns:
`np.ndarray`: The rescaled image.
"""
if not isinstance(image, np.ndarray):
raise ValueError(f"Input image must be of type np.ndarray, got {type(image)}")

rescaled_image = image * scale
if data_format is not None:
rescaled_image = to_channel_dimension_format(rescaled_image, data_format)
rescaled_image = rescaled_image.astype(dtype)
return rescaled_image


def to_pil_image(
image: Union[np.ndarray, PIL.Image.Image, "torch.Tensor", "tf.Tensor", "jnp.Tensor"],
do_rescale: Optional[bool] = None,
) -> PIL.Image.Image:
"""
Converts `image` to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if
needed.

Args:
image (`PIL.Image.Image` or `numpy.ndarray` or `torch.Tensor` or `tf.Tensor`):
The image to convert to the `PIL.Image` format.
do_rescale (`bool`, *optional*):
Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default
to `True` if the image type is a floating type, `False` otherwise.

Returns:
`PIL.Image.Image`: The converted image.
"""
if isinstance(image, PIL.Image.Image):
return image

# Convert all tensors to numpy arrays before converting to PIL image
if is_torch_tensor(image) or is_tf_tensor(image):
image = image.numpy()
elif is_jax_tensor(image):
image = np.array(image)
elif not isinstance(image, np.ndarray):
raise ValueError("Input image type not supported: {}".format(type(image)))

# If the channel as been moved to first dim, we put it back at the end.
image = to_channel_dimension_format(image, ChannelDimension.LAST)

# PIL.Image can only store uint8 values, so we rescale the image to be between 0 and 255 if needed.
do_rescale = isinstance(image.flat[0], float) if do_rescale is None else do_rescale
if do_rescale:
image = rescale(image, 255)
image = image.astype(np.uint8)
return PIL.Image.fromarray(image)


def get_resize_output_image_size(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The draft PR looks good to me! I have a question though, do we need two methods for resizing (get_resize_output_image_size and resize)?

The default_to_square argument is useful and I think removing it from the resize method might break some pipelines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two main reasons for having both get_resize_output_image_size and resize were:

  • Separation of concerns - resize is only responsible for resizing an image, not shape calculations. Coming from TF I found the resize behaviour surprising. This forces a more explicit shape passing when processing.
  • The shape logic is quite complex, this broke it down into smaller functions. If the shape calculation were moved back inside resize I would probably still have put the logic outside in a private function, but this is a style preference.

I don't feel very strongly about this and happy to move the finding shape logic to be internal to the method again.

For default_to_square where do you think this might break? Now you've highlighted - I could foresee in cases when individual feature extractor methods are used, although I haven't seen this any of our pipelines or notebooks. Let me know if there's something I'm overlooking or examples.

Copy link
Contributor

@NielsRogge NielsRogge Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this function used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not yet because GLPN does its own logic to find the output image size based on size_divisor.

input_image: np.ndarray,
size: Union[int, Tuple[int, int], List[int], Tuple[int]],
default_to_square: bool = True,
max_size: Optional[int] = None,
) -> tuple:
"""
Find the target (height, width) dimension of the output image after resizing given the input image and the desired
size.

Args:
input_image (`np.ndarray`):
The image to resize.
size (`int` or `Tuple[int, int]` or List[int] or Tuple[int]):
The size to use for resizing the image. If `size` is a sequence like (h, w), output size will be matched to
this.

If `size` is an int and `default_to_square` is `True`, then image will be resized to (size, size). If
`size` is an int and `default_to_square` is `False`, then smaller edge of the image will be matched to this
number. i.e, if height > width, then image will be rescaled to (size * height / width, size).
default_to_square (`bool`, *optional*, defaults to `True`):
How to convert `size` when it is a single int. If set to `True`, the `size` will be converted to a square
(`size`,`size`). If set to `False`, will replicate
[`torchvision.transforms.Resize`](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.Resize)
with support for resizing only the smallest edge and providing an optional `max_size`.
max_size (`int`, *optional*):
The maximum allowed for the longer edge of the resized image: if the longer edge of the image is greater
than `max_size` after being resized according to `size`, then the image is resized again so that the longer
edge is equal to `max_size`. As a result, `size` might be overruled, i.e the smaller edge may be shorter
than `size`. Only used if `default_to_square` is `False`.

Returns:
`tuple`: The target (height, width) dimension of the output image after resizing.
"""
if isinstance(size, (tuple, list)):
if len(size) == 2:
return tuple(size)
elif len(size) == 1:
# Perform same logic as if size was an int
size = size[0]
else:
raise ValueError("size must have 1 or 2 elements if it is a list or tuple")

if default_to_square:
return (size, size)

height, width = get_image_size(input_image)
short, long = (width, height) if width <= height else (height, width)
requested_new_short = size

if short == requested_new_short:
return (height, width)

new_short, new_long = requested_new_short, int(requested_new_short * long / short)

if max_size is not None:
if max_size <= requested_new_short:
raise ValueError(
f"max_size = {max_size} must be strictly greater than the requested "
f"size for the smaller edge size = {size}"
)
if new_long > max_size:
new_short, new_long = int(max_size * new_short / new_long), max_size

return (new_long, new_short) if width <= height else (new_short, new_long)


def resize(
image,
size: Tuple[int, int],
resample=PIL.Image.BILINEAR,
data_format: Optional[ChannelDimension] = None,
return_numpy: bool = True,
) -> np.ndarray:
"""
Resizes `image` to (h, w) specified by `size` using the PIL library.
Copy link
Contributor

@NielsRogge NielsRogge Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we use Pillow for resizing, maybe it makes more sense to let size be a tuple of (width, height) rather than the other way around?

Copy link
Contributor

@NielsRogge NielsRogge Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, won't this method support the same behaviour as torchvision's resize? i.e. when size is an int, it only resizes the smaller edge of the image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we use Pillow for resizing, maybe it makes more sense to let size be a tuple of (width, height) rather than the other way around?

As the input and output to the function are numpy arrays, I think that would be confusing. Ideally the user wouldn't have any knowledge about the library used to resize.

Also, won't this method support the same behaviour as torchvision's resize? i.e. when size is an int, it only resizes the smaller edge of the image

I moved the logic to find the dimensions of the output image into get_resize_output_image_size so that resize does just one thing. This is in part because there's a difference in behaviour between TF and PyTorch. Then each model's image processor can have its own logic within its resize method for finding the output shape, or use get_resize_output_image_size which has the same behaviour as before.


Args:
image (`PIL.Image.Image` or `np.ndarray` or `torch.Tensor`):
The image to resize.
size (`Tuple[int, int]`):
The size to use for resizing the image.
resample (`int`, *optional*, defaults to `PIL.Image.BILINEAR`):
The filter to user for resampling.
data_format (`ChannelDimension`, *optional*):
The channel dimension format of the output image. If `None`, will use the inferred format from the input.
return_numpy (`bool`, *optional*, defaults to `True`):
Whether or not to return the resized image as a numpy array. If False a `PIL.Image.Image` object is
returned.

Returns:
`np.ndarray`: The resized image.
"""
if not len(size) == 2:
raise ValueError("size must have 2 elements")

# For all transformations, we want to keep the same data format as the input image unless otherwise specified.
# The resized image from PIL will always have channels last, so find the input format first.
data_format = infer_channel_dimension_format(image) if data_format is None else data_format

# To maintain backwards compatibility with the resizing done in previous image feature extractors, we use
# the pillow library to resize the image and then convert back to numpy
if not isinstance(image, PIL.Image.Image):
# PIL expects image to have channels last
image = to_channel_dimension_format(image, ChannelDimension.LAST)
image = to_pil_image(image)
height, width = size
# PIL images are in the format (width, height)
resized_image = image.resize((width, height), resample=resample)

if return_numpy:
resized_image = np.array(resized_image)
resized_image = to_channel_dimension_format(resized_image, data_format)
return resized_image
Loading