-
Notifications
You must be signed in to change notification settings - Fork 31.1k
Add onnx support for VisionEncoderDecoder #19254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add onnx support for VisionEncoderDecoder #19254
Conversation
src/transformers/onnx/__main__.py
Outdated
| # Ensure the requested opset is sufficient | ||
| if args.opset is None: | ||
| args.opset = onnx_config.default_onnx_opset | ||
| if model_kind == "vision-encoder-decoder": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line create 2 workflows based on the type of the model. The first is for visionencoderdecoder and second is the old export workflow. This is currently hardcoded for 'vision-encoder-decoder'. Would appreciate feedback if we can do this better.
Another approach I was thinking is to make it generic like: model.hasattr(encoder) and model.hasattr(decoder). This should cover all models with two parts in config. But I am not sure of any problems that can occur in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, ideally we would want something generic that also makes it easy to add the other encoder-decoder models like:
- https://huggingface.co/docs/transformers/v4.22.2/en/model_doc/encoder-decoder
- https://huggingface.co/docs/transformers/v4.22.2/en/model_doc/speech-encoder-decoder
Your idea to use model.hasattr(encoder) won't be sufficient because all seq2seq models like t5 also have this attribute. I've asked internally to see if the core maintainers know how we can distinguish this case.
Internal Slack thread: https://huggingface.slack.com/archives/C01N44FJDHT/p1664797000511649
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK based on internal discussion, using model_kind is the current best choice here. My suggestion would be to create something like ENCODER_DECODER_MODELS = ["vision-encoder-decoder"] and then we can populate it with the other modalities as needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line create 2 workflows based on the type of the model. The first is for visionencoderdecoder and second is the old export workflow. This is currently hardcoded for 'vision-encoder-decoder'. Would appreciate feedback if we can do this better.
Another approach I was thinking is to make it generic like: model.hasattr(encoder) and model.hasattr(decoder). This should cover all models with two parts in config. But I am not sure of any problems that can occur in future.
Hello,when i convert trocr model to encoder.onnx and decoder.onnx,why we used "processor = TrOCRProcessor.from_pretrained("original basemodel")", can i use encoder.onnx or decoder.onnx replace it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dcdethan could you please elaborate more on the issue? Why would you like to use the onnx models to load the preprocessor? Could you please create a new issue in Optimum as ONNX exports have been migrated to the exporters in Optimum. https://github.com/huggingface/optimum/issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dcdethan could you please elaborate more on the issue? Why would you like to use the onnx models to load the preprocessor? Could you please create a new issue in
Optimumas ONNX exports have been migrated to the exporters inOptimum. https://github.com/huggingface/optimum/issues
ok,i will add a issue as you said.I use your example, the link i onnx_trocr_inference.py
|
The documentation is not available anymore as the PR was closed or merged. |
lewtun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding ONNX support for this highly request model type @mht-sharma 🔥 !!
Overall the PR looks great and there's a few small things we need to correct:
- since this model type is kind of special, I think we should document somewhere in the
serialization.mdxdocs that these models produce two files - you need to update the table in
serialization.mdxby runningmake fix-copies
src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
Outdated
Show resolved
Hide resolved
|
|
||
| @property | ||
| def outputs(self) -> Mapping[str, Mapping[int, str]]: | ||
| return OrderedDict({"last_hidden_state": {0: "batch", 1: "encoder_sequence"}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it could be helpful to indicate which hidden state we're referring to explicitly
| return OrderedDict({"last_hidden_state": {0: "batch", 1: "encoder_sequence"}}) | |
| return OrderedDict({"encoder_last_hidden_state": {0: "batch", 1: "encoder_sequence"}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a output name check between the reference model and ONNX exported model. Thus if we change the name it results in an error: Outputs doesn't match between reference model and ONNX exported model: {'encoder_last_hidden_state}'
src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
Outdated
Show resolved
Hide resolved
src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
Outdated
Show resolved
Hide resolved
src/transformers/onnx/__main__.py
Outdated
| # Ensure the requested opset is sufficient | ||
| if args.opset is None: | ||
| args.opset = onnx_config.default_onnx_opset | ||
| if model_kind == "vision-encoder-decoder": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, ideally we would want something generic that also makes it easy to add the other encoder-decoder models like:
- https://huggingface.co/docs/transformers/v4.22.2/en/model_doc/encoder-decoder
- https://huggingface.co/docs/transformers/v4.22.2/en/model_doc/speech-encoder-decoder
Your idea to use model.hasattr(encoder) won't be sufficient because all seq2seq models like t5 also have this attribute. I've asked internally to see if the core maintainers know how we can distinguish this case.
Internal Slack thread: https://huggingface.slack.com/archives/C01N44FJDHT/p1664797000511649
Co-authored-by: lewtun <[email protected]>
|
Hi, I would like to help here, It would be good for using Donut easier with ONNX :) @mht-sharma I can help fixing the errors that @lewtun comments |
Hi @WaterKnight1998 thanks for the help. I have updated with the PR with a new commit addressing the comments. |
|
lewtun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this @mht-sharma - it's looking very good!
I've left a bunch of nits, but once they're addressed I think this should be good to merge. Would you mind confirming that all the slow tests pass with these changes?
RUN_SLOW=1 pytest tests/onnx/test_onnx_v2.py
src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
Outdated
Show resolved
Hide resolved
src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
Outdated
Show resolved
Hide resolved
src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
Outdated
Show resolved
Hide resolved
src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
Outdated
Show resolved
Hide resolved
src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
Outdated
Show resolved
Hide resolved
src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
Outdated
Show resolved
Hide resolved
src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
Outdated
Show resolved
Hide resolved
src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
Outdated
Show resolved
Hide resolved
src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py
Outdated
Show resolved
Hide resolved
Co-authored-by: lewtun <[email protected]>
All tests pass |
lewtun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the final changes @mht-sharma!
This PR LGTM, so gently pinging @sgugger for final approval
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding support for those models!
This reverts commit 3080bb4.
|
@NielsRogge @mht-sharma I referred to some of issues in Model Conversion , I have transformed my Tr-OCR base printed model into Two File encoder.onnx and decoder.onnx. Now as per this (#19811) which i have followed thoroughly,I was trying with ORTEncoder and ORTDecoder but there seems to be issues in model.generate and also gives backhooks issue.Can you help here? |
|
Hi @umanniyaz , could you share what is the error message you are getting? Probably would be best to open a new issue or comment on existing issue with sample snippet and error message. I will be adding the support for the above model in |
|
Hi @mht-sharma, I'm getting this error message https://colab.research.google.com/drive/1CxngHndMjLmpRkDreOS2GJSHbtcH1Olr#scrollTo=fc5SIz6uzs6p in colab when switching to gpu the result it similar. |
|
@matthewchung74 it works with |
|
@BakingBrains thank you very much. do you see any improvement in performance with the onnx? I don't for both gpu and cpu. I'm not sure I really understand. if you have any general thoughts, it'd be appreciated. I have sample code below. |
|
@matthewchung74 There won't be that much difference you can see in terms of OCR performance. But there will be difference in inference speed as well as the consumption of resources. |
|
odd, I must be doing something wrong, since my inference using onnx is 75% slower. thanks for the response. I'll have to work on it some more. |
|
Inference of ONNX on GPU or CPU? because in one of my case the ONNX pipeline on GPU was taking 4.7 sec and same on CPU it was 6.2 sec Regarding the original model the inference speed on CPU for the pipeline was 7.4 sec whereas for the ONNX was 4.1 sec |
|
After getting ONNX -encoder.onnx and decoder.onnx , on running in Seq2seq ONNX, model inference improves but Accuracy of OCR gets worse |
|
use this for inference @matthewchung74 @BakingBrains #20644 |
|
@umanniyaz I tried a script which is pretty much the same as @mht-sharma has here. https://gist.github.com/mht-sharma/f38c670930ac7df413c07327e692ee39. the inference script in #20644 also looks pretty much the same as ant-sharmas. I'm not really sure what I am missing. do you have your experiment with the better performance in a colab or something sharable? |
|
Hi @matthewchung74 , I am working on the inference of such models on optimum@588. This implementation would use the iobinding to make the inference faster on GPU. As per the thread above, the model was exported using an |
|
Updated! From this original script https://gist.github.com/mht-sharma/f38c670930ac7df413c07327e692ee39 as shared by @mht-sharma It gives good inference and models accuracy remains preserved. Try to keep model initialisations at compile time |
|
@umanniyaz thank you for sharing . I'm still seeing some performance issues. I'm running your code almost as is and getting the following as output. here is the code. perhaps the difference is the test image. is your test image something you can share? |
|
hello, someone gives me an onnx model for TrOcr please |
Hi @Kamilya2020 pls use the following guide to export the model to onnx. https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model |
|
Hi @mht-sharma , `using Microsoft.AspNetCore.Mvc; namespace OcrSolution.API.Controllers; public IActionResult PerformOCR(IFormFile imageFile) } here's the error: |
|
my onnx model from this link JaidedAI/EasyOCR#786 |
|
I am facing this error when using onnx code for "trocr-large-printed". Can someone please help. |
|
Hi @nisargmehta-groww, I used the Python library instead of CLI to do this conversion. Here's my code (my custom trained TrOCR is saved at /content/content/TrOCR-model-initial-cut and you should be able to just replace that with microsoft/trocr-large-printed for when you run it). |

What does this PR do?
Fixes #14812
This PR enables the export of VisionEncoderDecoder models to ONNX.
The VisionEncoderDecoder models contains two parts, a vision transformer encoder and language modelling decoder. Both the models are exported to onnx separately as
encoder_model.onnxanddecoder_model.onnx.To enable the export of the model, the export call in the main file is segregated based on the model_kind.
Usage