Add onnx support for VisionEncoderDecoder #19254

mht-sharma · 2022-09-30T10:35:57Z

What does this PR do?

Fixes #14812
This PR enables the export of VisionEncoderDecoder models to ONNX.

The VisionEncoderDecoder models contains two parts, a vision transformer encoder and language modelling decoder. Both the models are exported to onnx separately as encoder_model.onnx and decoder_model.onnx.

To enable the export of the model, the export call in the main file is segregated based on the model_kind.

Usage

model_ckpt = "nlpconnect/vit-gpt2-image-captioning"
!python -m transformers.onnx --model={model_ckpt} --feature=vision2seq-lm onnx/ --atol 1e-3

mht-sharma · 2022-09-30T10:40:28Z

src/transformers/onnx/__main__.py

-    # Ensure the requested opset is sufficient
-    if args.opset is None:
-        args.opset = onnx_config.default_onnx_opset
+    if model_kind == "vision-encoder-decoder":


This line create 2 workflows based on the type of the model. The first is for visionencoderdecoder and second is the old export workflow. This is currently hardcoded for 'vision-encoder-decoder'. Would appreciate feedback if we can do this better.

Another approach I was thinking is to make it generic like: model.hasattr(encoder) and model.hasattr(decoder). This should cover all models with two parts in config. But I am not sure of any problems that can occur in future.

Yeah, ideally we would want something generic that also makes it easy to add the other encoder-decoder models like:

https://huggingface.co/docs/transformers/v4.22.2/en/model_doc/encoder-decoder

https://huggingface.co/docs/transformers/v4.22.2/en/model_doc/speech-encoder-decoder

Your idea to use model.hasattr(encoder) won't be sufficient because all seq2seq models like t5 also have this attribute. I've asked internally to see if the core maintainers know how we can distinguish this case.

Internal Slack thread: https://huggingface.slack.com/archives/C01N44FJDHT/p1664797000511649

OK based on internal discussion, using model_kind is the current best choice here. My suggestion would be to create something like ENCODER_DECODER_MODELS = ["vision-encoder-decoder"] and then we can populate it with the other modalities as needed

This line create 2 workflows based on the type of the model. The first is for visionencoderdecoder and second is the old export workflow. This is currently hardcoded for 'vision-encoder-decoder'. Would appreciate feedback if we can do this better.

Another approach I was thinking is to make it generic like: model.hasattr(encoder) and model.hasattr(decoder). This should cover all models with two parts in config. But I am not sure of any problems that can occur in future.

Hello,when i convert trocr model to encoder.onnx and decoder.onnx,why we used "processor = TrOCRProcessor.from_pretrained("original basemodel")", can i use encoder.onnx or decoder.onnx replace it?

Hi @dcdethan could you please elaborate more on the issue? Why would you like to use the onnx models to load the preprocessor? Could you please create a new issue in Optimum as ONNX exports have been migrated to the exporters in Optimum. https://github.com/huggingface/optimum/issues

Hi @dcdethan could you please elaborate more on the issue? Why would you like to use the onnx models to load the preprocessor? Could you please create a new issue in Optimum as ONNX exports have been migrated to the exporters in Optimum. https://github.com/huggingface/optimum/issues

ok,i will add a issue as you said.I use your example, the link i onnx_trocr_inference.py

tests/onnx/test_onnx_v2.py

HuggingFaceDocBuilderDev · 2022-09-30T10:51:15Z

The documentation is not available anymore as the PR was closed or merged.

lewtun

Thanks for adding ONNX support for this highly request model type @mht-sharma 🔥 !!

Overall the PR looks great and there's a few small things we need to correct:

since this model type is kind of special, I think we should document somewhere in the serialization.mdx docs that these models produce two files
you need to update the table in serialization.mdx by running make fix-copies

src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py

lewtun · 2022-10-03T11:22:43Z

src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py

+
+    @property
+    def outputs(self) -> Mapping[str, Mapping[int, str]]:
+        return OrderedDict({"last_hidden_state": {0: "batch", 1: "encoder_sequence"}})


nit: it could be helpful to indicate which hidden state we're referring to explicitly

Suggested change

return OrderedDict({"last_hidden_state": {0: "batch", 1: "encoder_sequence"}})

return OrderedDict({"encoder_last_hidden_state": {0: "batch", 1: "encoder_sequence"}})

There is a output name check between the reference model and ONNX exported model. Thus if we change the name it results in an error: Outputs doesn't match between reference model and ONNX exported model: {'encoder_last_hidden_state}'

src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py

lewtun · 2022-10-03T11:38:58Z

src/transformers/onnx/__main__.py

-    # Ensure the requested opset is sufficient
-    if args.opset is None:
-        args.opset = onnx_config.default_onnx_opset
+    if model_kind == "vision-encoder-decoder":


Yeah, ideally we would want something generic that also makes it easy to add the other encoder-decoder models like:

https://huggingface.co/docs/transformers/v4.22.2/en/model_doc/encoder-decoder

https://huggingface.co/docs/transformers/v4.22.2/en/model_doc/speech-encoder-decoder

Your idea to use model.hasattr(encoder) won't be sufficient because all seq2seq models like t5 also have this attribute. I've asked internally to see if the core maintainers know how we can distinguish this case.

Internal Slack thread: https://huggingface.slack.com/archives/C01N44FJDHT/p1664797000511649

tests/onnx/test_onnx_v2.py

Co-authored-by: lewtun <[email protected]>

WaterKnight1998 · 2022-10-07T10:06:55Z

Hi, I would like to help here, It would be good for using Donut easier with ONNX :) @mht-sharma I can help fixing the errors that @lewtun comments

mht-sharma · 2022-10-07T12:52:43Z

Hi, I would like to help here, It would be good for using Donut easier with ONNX :) @mht-sharma I can help fixing the errors that @lewtun comments

Hi @WaterKnight1998 thanks for the help. I have updated with the PR with a new commit addressing the comments.

mht-sharma · 2022-10-07T13:12:16Z

Thanks for adding ONNX support for this highly request model type @mht-sharma 🔥 !!

Overall the PR looks great and there's a few small things we need to correct:

since this model type is kind of special, I think we should document somewhere in the serialization.mdx docs that these models produce two files

you need to update the table in serialization.mdx by running make fix-copies

Added a Tip for the generation of 2 onnx files or VisionEncoderDecoder models.
Updated serialization.mdx with make fix-copies

lewtun

Thanks for iterating on this @mht-sharma - it's looking very good!

I've left a bunch of nits, but once they're addressed I think this should be good to merge. Would you mind confirming that all the slow tests pass with these changes?

RUN_SLOW=1 pytest tests/onnx/test_onnx_v2.py

docs/source/en/serialization.mdx

src/transformers/models/vision_encoder_decoder/configuration_vision_encoder_decoder.py

Co-authored-by: lewtun <[email protected]>

mht-sharma · 2022-10-10T02:31:44Z

Thanks for iterating on this @mht-sharma - it's looking very good!

I've left a bunch of nits, but once they're addressed I think this should be good to merge. Would you mind confirming that all the slow tests pass with these changes?
RUN_SLOW=1 pytest tests/onnx/test_onnx_v2.py

All tests pass

lewtun

Thanks for addressing the final changes @mht-sharma!

This PR LGTM, so gently pinging @sgugger for final approval

src/transformers/onnx/__main__.py

sgugger

Thanks for adding support for those models!

This reverts commit 3080bb4.

umanniyaz · 2022-12-07T07:28:35Z

@NielsRogge @mht-sharma I referred to some of issues in Model Conversion , I have transformed my Tr-OCR base printed model into Two File encoder.onnx and decoder.onnx. Now as per this (#19811) which i have followed thoroughly,I was trying with ORTEncoder and ORTDecoder but there seems to be issues in model.generate and also gives backhooks issue.Can you help here?

mht-sharma · 2022-12-07T07:38:42Z

Hi @umanniyaz , could you share what is the error message you are getting? Probably would be best to open a new issue or comment on existing issue with sample snippet and error message.

I will be adding the support for the above model in optimum onnxruntime soon, which would enable you to run inference with the model directly. I could update the PR here in some days to keep you in loop.

matthewchung74 · 2022-12-28T16:53:43Z

Hi @mht-sharma, I'm getting this error message

!python -m transformers.onnx --model="microsoft/trocr-large-printed" --feature=vision2seq-lm onnx/ --atol 1e-3

Framework not requested. Using torch to export to ONNX.
Some weights of VisionEncoderDecoderModel were not initialized from the model checkpoint at microsoft/trocr-large-printed and are newly initialized: ['encoder.pooler.dense.bias', 'encoder.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using framework PyTorch: 1.13.0+cu116
/usr/local/lib/python3.8/dist-packages/transformers/models/vit/modeling_vit.py:176: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if num_channels != self.num_channels:
/usr/local/lib/python3.8/dist-packages/transformers/models/vit/modeling_vit.py:181: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if height != self.image_size[0] or width != self.image_size[1]:
tcmalloc: large alloc 1219141632 bytes == 0x12f19c000 @  0x7fb5e825d887 0x7fb5e6b53c29 0x7fb5e6b54afb 0x7fb5e6b54bb4 0x7fb5e6b54f9c 0x7fb520720a74 0x7fb520720fa5 0x7fb50ff9bced 0x7fb5355c16b4 0x7fb53507e6af 0x5d80be 0x5d8d8c 0x4fedd4 0x4997c7 0x55cd91 0x5d8941 0x49abe4 0x55cd91 0x5d8941 0x49abe4 0x55d078 0x5d8941 0x49abe4 0x55cd91 0x5d8941 0x4990ca 0x5d8868 0x4990ca 0x55cd91 0x55d743 0x627376
tcmalloc: large alloc 1219141632 bytes == 0x177c46000 @  0x7fb5e825b1e7 0x4d30a0 0x5dede2 0x7fb5355c16eb 0x7fb53507e6af 0x5d80be 0x5d8d8c 0x4fedd4 0x4997c7 0x55cd91 0x5d8941 0x49abe4 0x55cd91 0x5d8941 0x49abe4 0x55d078 0x5d8941 0x49abe4 0x55cd91 0x5d8941 0x4990ca 0x5d8868 0x4990ca 0x55cd91 0x55d743 0x627376 0x5aaeb9 0x4990ca 0x55cd91 0x5d8941 0x4990ca
Validating ONNX model...
	-[✓] ONNX model output names match reference model ({'last_hidden_state'})
	- Validating ONNX Model output "last_hidden_state":
		-[✓] (3, 577, 1024) matches (3, 577, 1024)
		-[x] values not close enough (atol: 0.001)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.8/dist-packages/transformers/onnx/__main__.py", line 180, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/transformers/onnx/__main__.py", line 107, in main
    validate_model_outputs(
  File "/usr/local/lib/python3.8/dist-packages/transformers/onnx/convert.py", line 472, in validate_model_outputs
    raise ValueError(
ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.002560138702392578 for [ -6.9223547   -1.2663026   -6.01969     -4.5837965    6.1642013
   2.0628624   -3.5507686    1.9246662   -0.676687    -1.4317726
  -2.3281486   -1.0843381   -8.45131     -5.4161043    6.8614235
   4.190215    -5.2153773    3.0814483    0.63340956  -2.0609605
  -8.46502      0.8696586   -6.97839      2.6996267    2.5350282
  -8.500374    -4.0548806    1.1920781   -3.4029136   -5.475586
   0.7265783    1.5491551  -11.724315   -10.578344     0.1786897
  -0.5544502   -0.03817908  -1.506731    -6.7059665    2.4884484
 -10.02477      3.6103365    4.648042  ] vs [ -6.921267    -1.2674816   -6.02225     -4.585174     6.1653543
   2.0647495   -3.551866     1.9266967   -0.67827606  -1.4307456
  -2.3269713   -1.0858293   -8.453766    -5.4173226    6.862612
   4.1921353   -5.216509     3.0834184    0.6319726   -2.0598402
  -8.4638815    0.8707002   -6.9765515    2.6985872    2.533873
  -8.501539    -4.0534215    1.1903901   -3.401852    -5.477874
   0.72761816   1.5506129  -11.725906   -10.577168     0.1799282
  -0.55576885  -0.0398552   -1.5055205   -6.7073493    2.489525
 -10.022912     3.6091       4.646897  ]

https://colab.research.google.com/drive/1CxngHndMjLmpRkDreOS2GJSHbtcH1Olr#scrollTo=fc5SIz6uzs6p

in colab when switching to gpu the result it similar.

BakingBrains · 2022-12-30T02:45:25Z

@matthewchung74 it works with --atol 1e-2
!python -m transformers.onnx --model="microsoft/trocr-large-printed" --feature=vision2seq-lm onnx/ --atol 1e-2

matthewchung74 · 2022-12-30T07:26:21Z

@BakingBrains thank you very much. do you see any improvement in performance with the onnx? I don't for both gpu and cpu. I'm not sure I really understand. if you have any general thoughts, it'd be appreciated. I have sample code below.
https://colab.research.google.com/drive/1CxngHndMjLmpRkDreOS2GJSHbtcH1Olr#scrollTo=I0xqkudSoxZw

BakingBrains · 2022-12-30T07:46:15Z

@matthewchung74 There won't be that much difference you can see in terms of OCR performance. But there will be difference in inference speed as well as the consumption of resources.

matthewchung74 · 2022-12-30T07:52:01Z

odd, I must be doing something wrong, since my inference using onnx is 75% slower. thanks for the response. I'll have to work on it some more.

BakingBrains · 2022-12-30T08:21:34Z

Inference of ONNX on GPU or CPU? because in one of my case the ONNX pipeline on GPU was taking 4.7 sec and same on CPU it was 6.2 sec

Regarding the original model the inference speed on CPU for the pipeline was 7.4 sec whereas for the ONNX was 4.1 sec

umanniyaz · 2022-12-30T09:06:54Z

After getting ONNX -encoder.onnx and decoder.onnx , on running in Seq2seq ONNX, model inference improves but Accuracy of OCR gets worse

umanniyaz · 2022-12-30T09:07:40Z

use this for inference @matthewchung74 @BakingBrains #20644

matthewchung74 · 2023-01-01T02:14:52Z

@umanniyaz I tried a script which is pretty much the same as @mht-sharma has here. https://gist.github.com/mht-sharma/f38c670930ac7df413c07327e692ee39. the inference script in #20644 also looks pretty much the same as ant-sharmas. I'm not really sure what I am missing. do you have your experiment with the better performance in a colab or something sharable?

mht-sharma · 2023-01-02T07:35:05Z

Hi @matthewchung74 , I am working on the inference of such models on optimum@588. This implementation would use the iobinding to make the inference faster on GPU.

As per the thread above, the model was exported using an --atol of 1e-2, which is quite high and may result in accuracy drop on inference. Would separately check this once the above implementation is completed.

umanniyaz · 2023-01-04T17:18:01Z

Updated!
Hi @matthewchung74 , probably try this :
https://github.com/umanniyaz/TR-OCR-ONNX-Optimization

From this original script https://gist.github.com/mht-sharma/f38c670930ac7df413c07327e692ee39 as shared by @mht-sharma

It gives good inference and models accuracy remains preserved.

Try to keep model initialisations at compile time

matthewchung74 · 2023-01-05T04:55:17Z

@umanniyaz thank you for sharing . I'm still seeing some performance issues. I'm running your code almost as is and getting the following as output.

Model Output Original is :  TICKET
Original time:  1.5399658679962158 TICKET
Model Ouput ORT is :  TICKET
ORT time:  3.5313572883605957 TICKET

here is the code. perhaps the difference is the test image. is your test image something you can share?
https://colab.research.google.com/drive/1ojsslQPxUO67_dGzI4ok4rRk6CSlcwf4?usp=sharing

Kamilya2020 · 2023-06-04T11:59:12Z

hello, someone gives me an onnx model for TrOcr please

mht-sharma · 2023-06-07T04:42:28Z

hello, someone gives me an onnx model for TrOcr please

Hi @Kamilya2020 pls use the following guide to export the model to onnx. https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model

Kamilya2020 · 2023-06-07T12:23:26Z

Hi @mht-sharma ,
can u help me please
I have this code

`using Microsoft.AspNetCore.Mvc;
using Microsoft.ML;
using Microsoft.ML.Data;
using SixLabors.ImageSharp;
using SixLabors.ImageSharp.Processing;
using SixLabors.ImageSharp.PixelFormats;

namespace OcrSolution.API.Controllers;

public class ModelInput
{
    [ColumnName("input1")]
    [VectorType(1,1, 64, 64)]
    public float[,,,] Input { get; set; }
}

public class ModelOutput
{
    [ColumnName("output")]
    [VectorType(1,1, 97)]
    public float[,,] Output { get; set; }
}

public class OcrController : ControllerBase
{
    private readonly MLContext _mlContext;
    private PredictionEngine<ModelInput, ModelOutput> _predictionEngine;

    public OcrController()
    {
        _mlContext = new MLContext();

        try
        {
            // Load the ONNX model
            var modelPath =
                "C:\\Users\\k.mimouni\\Desktop\\ocr web app\\sw-kamilia-2023\\components\\Server\\OcrSolution.API\\assets\\Model\\1_recognition_model.onnx";
            var pipeline = _mlContext.Transforms.ApplyOnnxModel(modelPath);
            var dataView = _mlContext.Data.LoadFromEnumerable(new[] { new ModelInput() });
            var transformer = pipeline.Fit(dataView);

            // Verify transformer is not null after fitting
            if (transformer == null)
            {
                throw new Exception("Transformer is null after fitting the pipeline.");
            }

            _predictionEngine = _mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(transformer);

            // Verify predictionEngine is not null after creation
            if (_predictionEngine == null)
            {
                throw new Exception("Prediction engine is null after creation.");
            }

            Console.WriteLine("ONNX model loaded successfully.");
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error loading ONNX model: {ex.Message}");
            _predictionEngine = null; // Set _predictionEngine to null in case of an error
        }
    }

   [HttpPost("ocr")]

public IActionResult PerformOCR(IFormFile imageFile)
{
if (_predictionEngine != null)
{
try
{
// Check if a file is uploaded
if (imageFile != null && imageFile.Length > 0)
{
// Resize and load the image data
var image = ResizeImage(imageFile);
if (image == null)
{
Console.WriteLine("Failed to resize the image.");
return StatusCode(500, "Failed to resize the image.");
}

            var imageData = LoadImageData(image);

            // Create the model input
            var input = new ModelInput { Input = imageData };

            if (input == null)
            {
                Console.WriteLine("Failed to create the model input.");
                return StatusCode(500, "Failed to create the model input.");
            }

            // Make a prediction
            var prediction = _predictionEngine.Predict(input);

            if (prediction == null || prediction.Output == null)
            {
                Console.WriteLine("Failed to make a prediction or prediction output is null.");
                return StatusCode(500, "Failed to make a prediction or prediction output is null.");
            }

            // Process the output and extract the text
            var extractedText = ExtractText(prediction.Output);

            // Return the extracted text
            return Ok(new { text = extractedText });
        }

        // No image file uploaded
        Console.WriteLine("No image file uploaded.");
        return BadRequest("No image file uploaded.");
    }
    catch (Exception ex)
    {
        // Error occurred while performing OCR
        Console.WriteLine($"An error occurred while performing OCR: {ex}");
        return StatusCode(500, $"An error occurred while performing OCR: {ex}");
    }
}

// Error loading ONNX model or prediction engine
return StatusCode(500, "Error loading ONNX model or prediction engine.");

}

    private Image<Rgba32> ResizeImage(IFormFile imageFile)
    {
        using (var memoryStream = new MemoryStream())
        {
            // Copy the file content to a memory stream
            imageFile.CopyTo(memoryStream);

            // Load the image using SixLabors.ImageSharp
            memoryStream.Seek(0, SeekOrigin.Begin);
            var image = Image.Load<Rgba32>(memoryStream, out var format);

            if (image == null)
            {
                throw new Exception("Failed to load the image.");
            }

            // Resize the image
            var resizedImage = image.Clone(x => x.Resize(new ResizeOptions
            {
                Size = new Size(754, 64),
                Mode = ResizeMode.Stretch
            }));

            if (resizedImage == null)
            {
                throw new Exception("Failed to resize the image.");
            }

            return resizedImage;
        }
    }

    private float[,,,] LoadImageData(Image<Rgba32> image)
    {
        var imageData = new float[1, 1, image.Height, image.Width];

        // Iterate over the pixels and convert them to float values
        for (int y = 0; y < image.Height; y++)
        {
            for (int x = 0; x < image.Width; x++)
            {
                var pixel = image[x, y];
                var pixelValue = GetPixelValue(pixel);
                imageData[0, 0, y, x] = pixelValue;
            }
        }

        return imageData;
    }


    private float GetPixelValue(Rgba32 pixel)
    {
        // Normalize the pixel value to the range [0, 1]
        return pixel.R / 255f;
    }

    private string ExtractText(float[,,] output)
    {
        string extractedText = "";
        for (int b = 0; b < output.GetLength(0); b++) // batch size
        {
            for (int h = 0; h < output.GetLength(1); h++) // output height
            {
                for (int w = 0; w < output.GetLength(2); w++) // output width (number of characters)
                {
                    int maxIndex = 0;
                    float maxValue = 0;
                    for (int c = 0; c < output.GetLength(2); c++)
                    {
                        if (output[b, h, c] > maxValue)
                        {
                            maxIndex = c;
                            maxValue = output[b, h, c];
                        }
                    }

                    char predictedChar = (char)maxIndex;
                    extractedText += predictedChar;
                }
            }
        }

        return extractedText;
    }
}`

here's the error:
An error occurred while performing OCR: System.NullReferenceException: Object reference not set to an instance of an object. at Microsoft.ML.Data.TypedCursorable1.TypedRowBase.<>c__DisplayClass8_01.<CreateDirectVBufferSetter>b__0(TRow row) at Microsoft.ML.Data.TypedCursorable1.TypedRowBase.FillValues(TRow row)
at Microsoft.ML.Data.TypedCursorable1.RowImplementation.FillValues(TRow row) at Microsoft.ML.PredictionEngineBase2.FillValues(TDst prediction)
at Microsoft.ML.PredictionEngine2.Predict(TSrc example, TDst& prediction) at Microsoft.ML.PredictionEngineBase2.Predict(TSrc example)
at OcrSolution.API.Controllers.OcrController.PerformOCR(IFormFile imageFile) in C:\Users\k.mimouni\Desktop\ocr web app\sw-kamilia-2023\components\Server\OcrSolution.API\Controllers\OcrController.cs:line 95`

Kamilya2020 · 2023-06-07T12:24:18Z

my onnx model from this link JaidedAI/EasyOCR#786

nisargmehta-groww · 2023-10-17T09:56:15Z

I am facing this error when using onnx code for "trocr-large-printed". Can someone please help.

model_ckpt = "microsoft/trocr-large-printed"
!python -m transformers.onnx --model={model_ckpt} --feature=vision2seq-lm onnx/ --atol 1e-2

usage: Hugging Face Transformers ONNX exporter [-h] -m MODEL
                                               [--feature {causal-lm,causal-lm-with-past,default,default-with-past,image-classification,masked-lm,question-answering,seq2seq-lm,seq2seq-lm-with-past,sequence-classification,token-classification}]
                                               [--opset OPSET] [--atol ATOL] [--framework {pt,tf}] [--cache_dir CACHE_DIR]
                                               output
Hugging Face Transformers ONNX exporter: error: argument --feature: invalid choice: 'vision2seq-lm' (choose from 'causal-lm', 'causal-lm-with-past', 'default', 'default-with-past', 'image-classification', 'masked-lm', 'question-answering', 'seq2seq-lm', 'seq2seq-lm-with-past', 'sequence-classification', 'token-classification')

donjuanpond · 2024-07-30T21:13:28Z

Hi @nisargmehta-groww, I used the Python library instead of CLI to do this conversion. Here's my code (my custom trained TrOCR is saved at /content/content/TrOCR-model-initial-cut and you should be able to just replace that with microsoft/trocr-large-printed for when you run it).

from optimum.onnxruntime import ORTModelForVision2Seq
from transformers import TrOCRProcessor

# Load the pretrained TrOCR model
model_id = "/content/content/TrOCR-model-initial-cut"
model = ORTModelForVision2Seq.from_pretrained(model_id, export=True, task="image-to-text", use_cache=False)
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")

# Save the ONNX model
model.save_pretrained("onnx_trocr_model")
processor.save_pretrained("onnx_trocr_model")

mht-sharma added 2 commits September 30, 2022 15:06

Add onnx support for VisionEncoderDecoder

a05bc8d

Add onnx support for VisionEncoderDecoder

ea1ab99

mht-sharma requested review from lewtun and michaelbenayoun September 30, 2022 10:36

mht-sharma commented Sep 30, 2022

View reviewed changes

tests/onnx/test_onnx_v2.py Outdated Show resolved Hide resolved

Removed unused import

368f0fc

lewtun requested changes Oct 3, 2022

View reviewed changes

Rename encoder hidden state

312a009

Co-authored-by: lewtun <[email protected]>

mht-sharma added 2 commits October 7, 2022 15:49

Update docstrings and removed redundant code

0b15fed

Added test function for enc-dec models

28e5b88

mht-sharma marked this pull request as ready for review October 7, 2022 12:57

lewtun reviewed Oct 7, 2022

View reviewed changes

mht-sharma and others added 2 commits October 7, 2022 18:51

Update doc string text

6aa9773

Co-authored-by: lewtun <[email protected]>

fixed code style

9dfdaa9

lewtun approved these changes Oct 10, 2022

View reviewed changes

src/transformers/onnx/__main__.py Show resolved Hide resolved

sgugger approved these changes Oct 10, 2022

View reviewed changes

sgugger merged commit 3080bb4 into huggingface:main Oct 10, 2022

WaterKnight1998 pushed a commit to WaterKnight1998/transformers that referenced this pull request Oct 10, 2022

Revert "Add onnx support for VisionEncoderDecoder (huggingface#19254)"

7c2ce42

This reverts commit 3080bb4.

WaterKnight1998 mentioned this pull request Oct 10, 2022

[REIMPLEMETATION] Vision encoder decoder Onnx conversion #19476

Closed

5 tasks

gbea mentioned this pull request Oct 12, 2022

AutoModelForVision2Seq support huggingface/exporters#5

Open

kangsan0420 mentioned this pull request Oct 14, 2022

ONNX conversion from VisionEncoderDecoderModel? #19604

Closed

lewtun mentioned this pull request Oct 14, 2022

Adds DonutSwin to models exportable with ONNX #19401

Closed

4 tasks

entropy2333 mentioned this pull request Oct 22, 2022

ONNX conversion from VisionEncoderDecoderModel with different dimensions #19811

Closed

4 tasks

This was referenced Oct 22, 2022

TrOCR on ONNX runtime NielsRogge/Transformers-Tutorials#69

Closed

ONNXConfig: Add a configuration for all available models #16308

Closed

mathewthe2 mentioned this pull request Dec 4, 2022

[Feature suggestion] Consider using manga-ocr arianneorpilla/jidoujisho#83

Closed

NielsRogge mentioned this pull request Dec 6, 2022

Help training TrOCR #14195

Closed

huggingface deleted a comment from umanniyaz Dec 7, 2022

This was referenced Apr 4, 2023

TrOCR Model Conversion NielsRogge/Transformers-Tutorials#272

Open

VisionEncoderDecoderModel ONNX Conversion - TrOCR #22565

Closed

	return OrderedDict({"last_hidden_state": {0: "batch", 1: "encoder_sequence"}})
	return OrderedDict({"encoder_last_hidden_state": {0: "batch", 1: "encoder_sequence"}})

Add onnx support for VisionEncoderDecoder #19254

Add onnx support for VisionEncoderDecoder #19254

Uh oh!

Conversation

mht-sharma commented Sep 30, 2022

What does this PR do?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

WaterKnight1998 commented Oct 7, 2022

Uh oh!

mht-sharma commented Oct 7, 2022

Uh oh!

mht-sharma commented Oct 7, 2022

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mht-sharma commented Oct 10, 2022

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

umanniyaz commented Dec 7, 2022

Uh oh!

mht-sharma commented Dec 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewchung74 commented Dec 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BakingBrains commented Dec 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewchung74 commented Dec 30, 2022

Uh oh!

BakingBrains commented Dec 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

HuggingFaceDocBuilderDev commented Sep 30, 2022 •

edited

Loading

mht-sharma commented Dec 7, 2022 •

edited

Loading

matthewchung74 commented Dec 28, 2022 •

edited

Loading

BakingBrains commented Dec 30, 2022 •

edited

Loading

BakingBrains commented Dec 30, 2022 •

edited

Loading

umanniyaz commented Jan 4, 2023 •

edited

Loading

matthewchung74 commented Jan 5, 2023 •

edited

Loading