Skip to content

Conversation

@mht-sharma
Copy link
Contributor

What does this PR do?

Fixes #14812
This PR enables the export of VisionEncoderDecoder models to ONNX.

The VisionEncoderDecoder models contains two parts, a vision transformer encoder and language modelling decoder. Both the models are exported to onnx separately as encoder_model.onnx and decoder_model.onnx.

To enable the export of the model, the export call in the main file is segregated based on the model_kind.

Usage

model_ckpt = "nlpconnect/vit-gpt2-image-captioning"
!python -m transformers.onnx --model={model_ckpt} --feature=vision2seq-lm onnx/ --atol 1e-3

# Ensure the requested opset is sufficient
if args.opset is None:
args.opset = onnx_config.default_onnx_opset
if model_kind == "vision-encoder-decoder":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line create 2 workflows based on the type of the model. The first is for visionencoderdecoder and second is the old export workflow. This is currently hardcoded for 'vision-encoder-decoder'. Would appreciate feedback if we can do this better.

Another approach I was thinking is to make it generic like: model.hasattr(encoder) and model.hasattr(decoder). This should cover all models with two parts in config. But I am not sure of any problems that can occur in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, ideally we would want something generic that also makes it easy to add the other encoder-decoder models like:

Your idea to use model.hasattr(encoder) won't be sufficient because all seq2seq models like t5 also have this attribute. I've asked internally to see if the core maintainers know how we can distinguish this case.

Internal Slack thread: https://huggingface.slack.com/archives/C01N44FJDHT/p1664797000511649

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK based on internal discussion, using model_kind is the current best choice here. My suggestion would be to create something like ENCODER_DECODER_MODELS = ["vision-encoder-decoder"] and then we can populate it with the other modalities as needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line create 2 workflows based on the type of the model. The first is for visionencoderdecoder and second is the old export workflow. This is currently hardcoded for 'vision-encoder-decoder'. Would appreciate feedback if we can do this better.

Another approach I was thinking is to make it generic like: model.hasattr(encoder) and model.hasattr(decoder). This should cover all models with two parts in config. But I am not sure of any problems that can occur in future.

Hello,when i convert trocr model to encoder.onnx and decoder.onnx,why we used "processor = TrOCRProcessor.from_pretrained("original basemodel")", can i use encoder.onnx or decoder.onnx replace it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dcdethan could you please elaborate more on the issue? Why would you like to use the onnx models to load the preprocessor? Could you please create a new issue in Optimum as ONNX exports have been migrated to the exporters in Optimum. https://github.com/huggingface/optimum/issues

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dcdethan could you please elaborate more on the issue? Why would you like to use the onnx models to load the preprocessor? Could you please create a new issue in Optimum as ONNX exports have been migrated to the exporters in Optimum. https://github.com/huggingface/optimum/issues

ok,i will add a issue as you said.I use your example, the link i onnx_trocr_inference.py

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 30, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding ONNX support for this highly request model type @mht-sharma 🔥 !!

Overall the PR looks great and there's a few small things we need to correct:

  • since this model type is kind of special, I think we should document somewhere in the serialization.mdx docs that these models produce two files
  • you need to update the table in serialization.mdx by running make fix-copies


@property
def outputs(self) -> Mapping[str, Mapping[int, str]]:
return OrderedDict({"last_hidden_state": {0: "batch", 1: "encoder_sequence"}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it could be helpful to indicate which hidden state we're referring to explicitly

Suggested change
return OrderedDict({"last_hidden_state": {0: "batch", 1: "encoder_sequence"}})
return OrderedDict({"encoder_last_hidden_state": {0: "batch", 1: "encoder_sequence"}})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a output name check between the reference model and ONNX exported model. Thus if we change the name it results in an error: Outputs doesn't match between reference model and ONNX exported model: {'encoder_last_hidden_state}'

# Ensure the requested opset is sufficient
if args.opset is None:
args.opset = onnx_config.default_onnx_opset
if model_kind == "vision-encoder-decoder":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, ideally we would want something generic that also makes it easy to add the other encoder-decoder models like:

Your idea to use model.hasattr(encoder) won't be sufficient because all seq2seq models like t5 also have this attribute. I've asked internally to see if the core maintainers know how we can distinguish this case.

Internal Slack thread: https://huggingface.slack.com/archives/C01N44FJDHT/p1664797000511649

@WaterKnight1998
Copy link

Hi, I would like to help here, It would be good for using Donut easier with ONNX :) @mht-sharma I can help fixing the errors that @lewtun comments

@mht-sharma
Copy link
Contributor Author

Hi, I would like to help here, It would be good for using Donut easier with ONNX :) @mht-sharma I can help fixing the errors that @lewtun comments

Hi @WaterKnight1998 thanks for the help. I have updated with the PR with a new commit addressing the comments.

@mht-sharma mht-sharma marked this pull request as ready for review October 7, 2022 12:57
@mht-sharma
Copy link
Contributor Author

Thanks for adding ONNX support for this highly request model type @mht-sharma 🔥 !!

Overall the PR looks great and there's a few small things we need to correct:

  • since this model type is kind of special, I think we should document somewhere in the serialization.mdx docs that these models produce two files
  • you need to update the table in serialization.mdx by running make fix-copies
  • Added a Tip for the generation of 2 onnx files or VisionEncoderDecoder models.
  • Updated serialization.mdx with make fix-copies

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this @mht-sharma - it's looking very good!

I've left a bunch of nits, but once they're addressed I think this should be good to merge. Would you mind confirming that all the slow tests pass with these changes?

RUN_SLOW=1 pytest tests/onnx/test_onnx_v2.py

@mht-sharma
Copy link
Contributor Author

Thanks for iterating on this @mht-sharma - it's looking very good!

I've left a bunch of nits, but once they're addressed I think this should be good to merge. Would you mind confirming that all the slow tests pass with these changes?

RUN_SLOW=1 pytest tests/onnx/test_onnx_v2.py

All tests pass

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the final changes @mht-sharma!

This PR LGTM, so gently pinging @sgugger for final approval

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding support for those models!

@umanniyaz
Copy link

@NielsRogge @mht-sharma I referred to some of issues in Model Conversion , I have transformed my Tr-OCR base printed model into Two File encoder.onnx and decoder.onnx. Now as per this (#19811) which i have followed thoroughly,I was trying with ORTEncoder and ORTDecoder but there seems to be issues in model.generate and also gives backhooks issue.Can you help here?

@mht-sharma
Copy link
Contributor Author

mht-sharma commented Dec 7, 2022

Hi @umanniyaz , could you share what is the error message you are getting? Probably would be best to open a new issue or comment on existing issue with sample snippet and error message.

I will be adding the support for the above model in optimum onnxruntime soon, which would enable you to run inference with the model directly. I could update the PR here in some days to keep you in loop.

@huggingface huggingface deleted a comment from umanniyaz Dec 7, 2022
@huggingface huggingface deleted a comment from umanniyaz Dec 7, 2022
@matthewchung74
Copy link

matthewchung74 commented Dec 28, 2022

Hi @mht-sharma, I'm getting this error message

!python -m transformers.onnx --model="microsoft/trocr-large-printed" --feature=vision2seq-lm onnx/ --atol 1e-3

Framework not requested. Using torch to export to ONNX.
Some weights of VisionEncoderDecoderModel were not initialized from the model checkpoint at microsoft/trocr-large-printed and are newly initialized: ['encoder.pooler.dense.bias', 'encoder.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using framework PyTorch: 1.13.0+cu116
/usr/local/lib/python3.8/dist-packages/transformers/models/vit/modeling_vit.py:176: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if num_channels != self.num_channels:
/usr/local/lib/python3.8/dist-packages/transformers/models/vit/modeling_vit.py:181: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if height != self.image_size[0] or width != self.image_size[1]:
tcmalloc: large alloc 1219141632 bytes == 0x12f19c000 @  0x7fb5e825d887 0x7fb5e6b53c29 0x7fb5e6b54afb 0x7fb5e6b54bb4 0x7fb5e6b54f9c 0x7fb520720a74 0x7fb520720fa5 0x7fb50ff9bced 0x7fb5355c16b4 0x7fb53507e6af 0x5d80be 0x5d8d8c 0x4fedd4 0x4997c7 0x55cd91 0x5d8941 0x49abe4 0x55cd91 0x5d8941 0x49abe4 0x55d078 0x5d8941 0x49abe4 0x55cd91 0x5d8941 0x4990ca 0x5d8868 0x4990ca 0x55cd91 0x55d743 0x627376
tcmalloc: large alloc 1219141632 bytes == 0x177c46000 @  0x7fb5e825b1e7 0x4d30a0 0x5dede2 0x7fb5355c16eb 0x7fb53507e6af 0x5d80be 0x5d8d8c 0x4fedd4 0x4997c7 0x55cd91 0x5d8941 0x49abe4 0x55cd91 0x5d8941 0x49abe4 0x55d078 0x5d8941 0x49abe4 0x55cd91 0x5d8941 0x4990ca 0x5d8868 0x4990ca 0x55cd91 0x55d743 0x627376 0x5aaeb9 0x4990ca 0x55cd91 0x5d8941 0x4990ca
Validating ONNX model...
	-[✓] ONNX model output names match reference model ({'last_hidden_state'})
	- Validating ONNX Model output "last_hidden_state":
		-[✓] (3, 577, 1024) matches (3, 577, 1024)
		-[x] values not close enough (atol: 0.001)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.8/dist-packages/transformers/onnx/__main__.py", line 180, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/transformers/onnx/__main__.py", line 107, in main
    validate_model_outputs(
  File "/usr/local/lib/python3.8/dist-packages/transformers/onnx/convert.py", line 472, in validate_model_outputs
    raise ValueError(
ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.002560138702392578 for [ -6.9223547   -1.2663026   -6.01969     -4.5837965    6.1642013
   2.0628624   -3.5507686    1.9246662   -0.676687    -1.4317726
  -2.3281486   -1.0843381   -8.45131     -5.4161043    6.8614235
   4.190215    -5.2153773    3.0814483    0.63340956  -2.0609605
  -8.46502      0.8696586   -6.97839      2.6996267    2.5350282
  -8.500374    -4.0548806    1.1920781   -3.4029136   -5.475586
   0.7265783    1.5491551  -11.724315   -10.578344     0.1786897
  -0.5544502   -0.03817908  -1.506731    -6.7059665    2.4884484
 -10.02477      3.6103365    4.648042  ] vs [ -6.921267    -1.2674816   -6.02225     -4.585174     6.1653543
   2.0647495   -3.551866     1.9266967   -0.67827606  -1.4307456
  -2.3269713   -1.0858293   -8.453766    -5.4173226    6.862612
   4.1921353   -5.216509     3.0834184    0.6319726   -2.0598402
  -8.4638815    0.8707002   -6.9765515    2.6985872    2.533873
  -8.501539    -4.0534215    1.1903901   -3.401852    -5.477874
   0.72761816   1.5506129  -11.725906   -10.577168     0.1799282
  -0.55576885  -0.0398552   -1.5055205   -6.7073493    2.489525
 -10.022912     3.6091       4.646897  ]

https://colab.research.google.com/drive/1CxngHndMjLmpRkDreOS2GJSHbtcH1Olr#scrollTo=fc5SIz6uzs6p

in colab when switching to gpu the result it similar.

@BakingBrains
Copy link
Contributor

BakingBrains commented Dec 30, 2022

@matthewchung74 it works with --atol 1e-2
!python -m transformers.onnx --model="microsoft/trocr-large-printed" --feature=vision2seq-lm onnx/ --atol 1e-2

image

@matthewchung74
Copy link

@BakingBrains thank you very much. do you see any improvement in performance with the onnx? I don't for both gpu and cpu. I'm not sure I really understand. if you have any general thoughts, it'd be appreciated. I have sample code below.
https://colab.research.google.com/drive/1CxngHndMjLmpRkDreOS2GJSHbtcH1Olr#scrollTo=I0xqkudSoxZw

@BakingBrains
Copy link
Contributor

BakingBrains commented Dec 30, 2022

@matthewchung74 There won't be that much difference you can see in terms of OCR performance. But there will be difference in inference speed as well as the consumption of resources.

@matthewchung74
Copy link

odd, I must be doing something wrong, since my inference using onnx is 75% slower. thanks for the response. I'll have to work on it some more.

@BakingBrains
Copy link
Contributor

Inference of ONNX on GPU or CPU? because in one of my case the ONNX pipeline on GPU was taking 4.7 sec and same on CPU it was 6.2 sec

Regarding the original model the inference speed on CPU for the pipeline was 7.4 sec whereas for the ONNX was 4.1 sec

@umanniyaz
Copy link

After getting ONNX -encoder.onnx and decoder.onnx , on running in Seq2seq ONNX, model inference improves but Accuracy of OCR gets worse

@umanniyaz
Copy link

use this for inference @matthewchung74 @BakingBrains #20644

@matthewchung74
Copy link

@umanniyaz I tried a script which is pretty much the same as @mht-sharma has here. https://gist.github.com/mht-sharma/f38c670930ac7df413c07327e692ee39. the inference script in #20644 also looks pretty much the same as ant-sharmas. I'm not really sure what I am missing. do you have your experiment with the better performance in a colab or something sharable?

@mht-sharma
Copy link
Contributor Author

Hi @matthewchung74 , I am working on the inference of such models on optimum@588. This implementation would use the iobinding to make the inference faster on GPU.

As per the thread above, the model was exported using an --atol of 1e-2, which is quite high and may result in accuracy drop on inference. Would separately check this once the above implementation is completed.

@umanniyaz
Copy link

umanniyaz commented Jan 4, 2023

Updated!
Hi @matthewchung74 , probably try this :
https://github.com/umanniyaz/TR-OCR-ONNX-Optimization

From this original script https://gist.github.com/mht-sharma/f38c670930ac7df413c07327e692ee39 as shared by @mht-sharma

It gives good inference and models accuracy remains preserved.

Try to keep model initialisations at compile time

@matthewchung74
Copy link

matthewchung74 commented Jan 5, 2023

@umanniyaz thank you for sharing . I'm still seeing some performance issues. I'm running your code almost as is and getting the following as output.

Model Output Original is :  TICKET
Original time:  1.5399658679962158 TICKET
Model Ouput ORT is :  TICKET
ORT time:  3.5313572883605957 TICKET

here is the code. perhaps the difference is the test image. is your test image something you can share?
https://colab.research.google.com/drive/1ojsslQPxUO67_dGzI4ok4rRk6CSlcwf4?usp=sharing

@Kamilya2020
Copy link

hello, someone gives me an onnx model for TrOcr please

@mht-sharma
Copy link
Contributor Author

hello, someone gives me an onnx model for TrOcr please

Hi @Kamilya2020 pls use the following guide to export the model to onnx. https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model

@Kamilya2020
Copy link

Hi @mht-sharma ,
can u help me please
I have this code

`using Microsoft.AspNetCore.Mvc;
using Microsoft.ML;
using Microsoft.ML.Data;
using SixLabors.ImageSharp;
using SixLabors.ImageSharp.Processing;
using SixLabors.ImageSharp.PixelFormats;

namespace OcrSolution.API.Controllers;

public class ModelInput
{
    [ColumnName("input1")]
    [VectorType(1,1, 64, 64)]
    public float[,,,] Input { get; set; }
}

public class ModelOutput
{
    [ColumnName("output")]
    [VectorType(1,1, 97)]
    public float[,,] Output { get; set; }
}

public class OcrController : ControllerBase
{
    private readonly MLContext _mlContext;
    private PredictionEngine<ModelInput, ModelOutput> _predictionEngine;

    public OcrController()
    {
        _mlContext = new MLContext();

        try
        {
            // Load the ONNX model
            var modelPath =
                "C:\\Users\\k.mimouni\\Desktop\\ocr web app\\sw-kamilia-2023\\components\\Server\\OcrSolution.API\\assets\\Model\\1_recognition_model.onnx";
            var pipeline = _mlContext.Transforms.ApplyOnnxModel(modelPath);
            var dataView = _mlContext.Data.LoadFromEnumerable(new[] { new ModelInput() });
            var transformer = pipeline.Fit(dataView);

            // Verify transformer is not null after fitting
            if (transformer == null)
            {
                throw new Exception("Transformer is null after fitting the pipeline.");
            }

            _predictionEngine = _mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(transformer);

            // Verify predictionEngine is not null after creation
            if (_predictionEngine == null)
            {
                throw new Exception("Prediction engine is null after creation.");
            }

            Console.WriteLine("ONNX model loaded successfully.");
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error loading ONNX model: {ex.Message}");
            _predictionEngine = null; // Set _predictionEngine to null in case of an error
        }
    }

   [HttpPost("ocr")]

public IActionResult PerformOCR(IFormFile imageFile)
{
if (_predictionEngine != null)
{
try
{
// Check if a file is uploaded
if (imageFile != null && imageFile.Length > 0)
{
// Resize and load the image data
var image = ResizeImage(imageFile);
if (image == null)
{
Console.WriteLine("Failed to resize the image.");
return StatusCode(500, "Failed to resize the image.");
}

            var imageData = LoadImageData(image);

            // Create the model input
            var input = new ModelInput { Input = imageData };

            if (input == null)
            {
                Console.WriteLine("Failed to create the model input.");
                return StatusCode(500, "Failed to create the model input.");
            }

            // Make a prediction
            var prediction = _predictionEngine.Predict(input);

            if (prediction == null || prediction.Output == null)
            {
                Console.WriteLine("Failed to make a prediction or prediction output is null.");
                return StatusCode(500, "Failed to make a prediction or prediction output is null.");
            }

            // Process the output and extract the text
            var extractedText = ExtractText(prediction.Output);

            // Return the extracted text
            return Ok(new { text = extractedText });
        }

        // No image file uploaded
        Console.WriteLine("No image file uploaded.");
        return BadRequest("No image file uploaded.");
    }
    catch (Exception ex)
    {
        // Error occurred while performing OCR
        Console.WriteLine($"An error occurred while performing OCR: {ex}");
        return StatusCode(500, $"An error occurred while performing OCR: {ex}");
    }
}

// Error loading ONNX model or prediction engine
return StatusCode(500, "Error loading ONNX model or prediction engine.");

}

    private Image<Rgba32> ResizeImage(IFormFile imageFile)
    {
        using (var memoryStream = new MemoryStream())
        {
            // Copy the file content to a memory stream
            imageFile.CopyTo(memoryStream);

            // Load the image using SixLabors.ImageSharp
            memoryStream.Seek(0, SeekOrigin.Begin);
            var image = Image.Load<Rgba32>(memoryStream, out var format);

            if (image == null)
            {
                throw new Exception("Failed to load the image.");
            }

            // Resize the image
            var resizedImage = image.Clone(x => x.Resize(new ResizeOptions
            {
                Size = new Size(754, 64),
                Mode = ResizeMode.Stretch
            }));

            if (resizedImage == null)
            {
                throw new Exception("Failed to resize the image.");
            }

            return resizedImage;
        }
    }

    private float[,,,] LoadImageData(Image<Rgba32> image)
    {
        var imageData = new float[1, 1, image.Height, image.Width];

        // Iterate over the pixels and convert them to float values
        for (int y = 0; y < image.Height; y++)
        {
            for (int x = 0; x < image.Width; x++)
            {
                var pixel = image[x, y];
                var pixelValue = GetPixelValue(pixel);
                imageData[0, 0, y, x] = pixelValue;
            }
        }

        return imageData;
    }


    private float GetPixelValue(Rgba32 pixel)
    {
        // Normalize the pixel value to the range [0, 1]
        return pixel.R / 255f;
    }

    private string ExtractText(float[,,] output)
    {
        string extractedText = "";
        for (int b = 0; b < output.GetLength(0); b++) // batch size
        {
            for (int h = 0; h < output.GetLength(1); h++) // output height
            {
                for (int w = 0; w < output.GetLength(2); w++) // output width (number of characters)
                {
                    int maxIndex = 0;
                    float maxValue = 0;
                    for (int c = 0; c < output.GetLength(2); c++)
                    {
                        if (output[b, h, c] > maxValue)
                        {
                            maxIndex = c;
                            maxValue = output[b, h, c];
                        }
                    }

                    char predictedChar = (char)maxIndex;
                    extractedText += predictedChar;
                }
            }
        }

        return extractedText;
    }
}` 

here's the error:
An error occurred while performing OCR: System.NullReferenceException: Object reference not set to an instance of an object. at Microsoft.ML.Data.TypedCursorable1.TypedRowBase.<>c__DisplayClass8_01.<CreateDirectVBufferSetter>b__0(TRow row) at Microsoft.ML.Data.TypedCursorable1.TypedRowBase.FillValues(TRow row)
at Microsoft.ML.Data.TypedCursorable1.RowImplementation.FillValues(TRow row) at Microsoft.ML.PredictionEngineBase2.FillValues(TDst prediction)
at Microsoft.ML.PredictionEngine2.Predict(TSrc example, TDst& prediction) at Microsoft.ML.PredictionEngineBase2.Predict(TSrc example)
at OcrSolution.API.Controllers.OcrController.PerformOCR(IFormFile imageFile) in C:\Users\k.mimouni\Desktop\ocr web app\sw-kamilia-2023\components\Server\OcrSolution.API\Controllers\OcrController.cs:line 95`

@Kamilya2020
Copy link

my onnx model from this link JaidedAI/EasyOCR#786

@nisargmehta-groww
Copy link

I am facing this error when using onnx code for "trocr-large-printed". Can someone please help.

model_ckpt = "microsoft/trocr-large-printed"
!python -m transformers.onnx --model={model_ckpt} --feature=vision2seq-lm onnx/ --atol 1e-2
usage: Hugging Face Transformers ONNX exporter [-h] -m MODEL
                                               [--feature {causal-lm,causal-lm-with-past,default,default-with-past,image-classification,masked-lm,question-answering,seq2seq-lm,seq2seq-lm-with-past,sequence-classification,token-classification}]
                                               [--opset OPSET] [--atol ATOL] [--framework {pt,tf}] [--cache_dir CACHE_DIR]
                                               output
Hugging Face Transformers ONNX exporter: error: argument --feature: invalid choice: 'vision2seq-lm' (choose from 'causal-lm', 'causal-lm-with-past', 'default', 'default-with-past', 'image-classification', 'masked-lm', 'question-answering', 'seq2seq-lm', 'seq2seq-lm-with-past', 'sequence-classification', 'token-classification')

@donjuanpond
Copy link

Hi @nisargmehta-groww, I used the Python library instead of CLI to do this conversion. Here's my code (my custom trained TrOCR is saved at /content/content/TrOCR-model-initial-cut and you should be able to just replace that with microsoft/trocr-large-printed for when you run it).

from optimum.onnxruntime import ORTModelForVision2Seq
from transformers import TrOCRProcessor

# Load the pretrained TrOCR model
model_id = "/content/content/TrOCR-model-initial-cut"
model = ORTModelForVision2Seq.from_pretrained(model_id, export=True, task="image-to-text", use_cache=False)
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")

# Save the ONNX model
model.save_pretrained("onnx_trocr_model")
processor.save_pretrained("onnx_trocr_model")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable ONNX export for VisionDecoderEncoderModel