Add onnx export cuda support #17183

JingyaHuang · 2022-05-11T15:32:45Z

What does this PR do?

Add CUDA support for transformers.onnx.export_pytorch.
Add test for transformers.onnx.export_pytorch on CUDA.

Context

While executing optimum.ORTTrainer with --deepspeed and --fp16 enabled, the export to onnx will fail since all layers of the models are not implemented for half-precision. Need to trace on CUDA as workaround.

Who can review?

@michaelbenayoun @lewtun

HuggingFaceDocBuilderDev · 2022-05-11T15:48:39Z

The documentation is not available anymore as the PR was closed or merged.

michaelbenayoun

Great, thanks for adding this @JingyaHuang!

If I understand correctly, this enables tracing half-precision models?

JingyaHuang · 2022-05-12T09:16:49Z

Great, thanks for adding this @JingyaHuang!

If I understand correctly, this enables tracing half-precision models?

Hi @michaelbenayoun ,
Yes, but only for PyTorch since tf2onnx has specified the device to be CPU.

LysandreJik

LGTM

src/transformers/onnx/convert.py

Co-authored-by: Lysandre Debut <[email protected]>

lewtun

Thanks for this feature @JingyaHuang !

I've left a question about using a device arg instead of cuda. Let me know what you think (and feel free to disargee) :)

src/transformers/onnx/convert.py

Co-authored-by: lewtun <[email protected]>

lewtun

Thanks for iterating on this @JingyaHuang !

I've left a few final nits, but this is looking really nice :)

Could you please confirm that the slow tests pass on both CPU and GPU devices?

RUN_SLOW=1 pytest tests/onnx/test_onnx_v2.py

src/transformers/onnx/convert.py

JingyaHuang · 2022-05-16T23:02:12Z

Thanks for iterating on this @JingyaHuang !

I've left a few final nits, but this is looking really nice :)

Could you please confirm that the slow tests pass on both CPU and GPU devices?
RUN_SLOW=1 pytest tests/onnx/test_onnx_v2.py

Hi @lewtun , by running the slow tests on CPU and GPU, I got the following results. It seems that some models and tasks failed. Trying to find out the root of the problems now.

======================================================================================== short test summary info ========================================================================================
FAILED tests/onnx/test_onnx_v2.py::OnnxExportTestCaseV2::test_pytorch_export_12_bert_next_sentence_prediction - ValueError: next-sentence-prediction is not a supported task, supported tasks: dict_ke...
FAILED tests/onnx/test_onnx_v2.py::OnnxExportTestCaseV2::test_pytorch_export_71_mobilebert_next_sentence_prediction - ValueError: next-sentence-prediction is not a supported task, supported tasks: d...
FAILED tests/onnx/test_onnx_v2.py::OnnxExportTestCaseV2::test_pytorch_export_on_cuda_12_bert_next_sentence_prediction - ValueError: next-sentence-prediction is not a supported task, supported tasks:...
FAILED tests/onnx/test_onnx_v2.py::OnnxExportTestCaseV2::test_pytorch_export_on_cuda_20_big_bird_question_answering - AssertionError: big-bird, question-answering -> Expected all tensors to be on th...
FAILED tests/onnx/test_onnx_v2.py::OnnxExportTestCaseV2::test_pytorch_export_on_cuda_71_mobilebert_next_sentence_prediction - ValueError: next-sentence-prediction is not a supported task, supported ...
========================================================== 5 failed, 177 passed, 77 skipped, 43 deselected, 158 warnings in 2478.21s (0:41:18) ==========================================================

lewtun · 2022-05-17T09:59:09Z

Oh yes, we recently reverted the next-sentence-prediction feature in #17276, so rebasing on main should fix those. The BigBird error looks more related to your PR, so let me know if you need some help debugging it :)

JingyaHuang · 2022-05-17T15:36:21Z

Oh yes, we recently reverted the next-sentence-prediction feature in #17276, so rebasing on main should fix those. The BigBird error looks more related to your PR, so let me know if you need some help debugging it :)

Hi @lewtun , thanks for the details. After rebasing, all checks for bert passed. And the problem of big bird comes from a bug in the modeling, that while creating the token_type_ids, the device is not specified lead to a mismatch of devices. I just fixed that. Now all checks of pytorch_export either on CPU or on CUDA passed.

lewtun · 2022-05-18T12:11:25Z

After rebasing, all checks for bert passed.

Cool! Just to double-check, did you run the tests:

On a CPU machine (no GPU, CUDA installed)
On a GPU machine

I'd like to be sure we don't accidentally break the test suite for developers coding on CPU machines :)

lewtun · 2022-05-18T13:58:21Z

examples/research_projects/lxmert/demo.ipynb

@@ -6,7 +6,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "#%pip install-r requirements.txt"
+    "# %pip install-r requirements.txt"


One last question: do you know why the style formatter is touching these files that aren't related to the PR? Ideally we should exclude these changes if possible

I did make style and then I saw this warning:

Skipping .ipynb files as Jupyter dependencies are not installed. You can fix this by running ``pip install black[jupyter]``

So I installed black[jupyter] in case we want to format the notebooks(maybe not the case?), then it formated the notebooks...

I wonder if we format the notebooks with black, do we?

lewtun

Thanks for iterating on this @JingyaHuang - LGTM!

commit 5419205 Author: Patrick von Platen <[email protected]> Date: Thu May 19 23:46:26 2022 +0200 [Test OPT] Add batch generation test opt (huggingface#17359) * up * up commit 48c2269 Author: ddobokki <[email protected]> Date: Fri May 20 05:42:44 2022 +0900 Fix bug in Wav2Vec2 pretrain example (huggingface#17326) commit 5d6feec Author: Nathan Dahlberg <[email protected]> Date: Thu May 19 16:21:19 2022 -0400 fix for 17292 (huggingface#17293) commit 518bd02 Author: Patrick von Platen <[email protected]> Date: Thu May 19 22:17:02 2022 +0200 [Generation] Fix Transition probs (huggingface#17311) * [Draft] fix transition probs * up * up * up * make it work * fix * finish * update commit e8714c0 Author: Patrick von Platen <[email protected]> Date: Thu May 19 22:15:36 2022 +0200 [OPT] Run test in lower precision on GPU (huggingface#17353) * [OPT] Run test only in half precision * up * up * up * up * finish * fix on GPU * Update tests/models/opt/test_modeling_opt.py commit 2b28229 Author: Nicolas Patry <[email protected]> Date: Thu May 19 20:28:12 2022 +0200 Adding `batch_size` test to QA pipeline. (huggingface#17330) commit a4386d7 Author: Nicolas Patry <[email protected]> Date: Thu May 19 10:29:16 2022 +0200 [BC] Fixing usage of text pairs (huggingface#17324) * [BC] Fixing usage of text pairs The BC is actually preventing users from misusing the pipeline since users could have been willing to send text pairs and the pipeline would instead understand the thing as a batch returning bogus results. The correct usage of text pairs is preserved in this PR even when that makes the code clunky. Adds support for {"text":..,, "text_pair": ...} inputs for both dataset iteration and more explicit usage to pairs. * Updating the doc. * Update src/transformers/pipelines/text_classification.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/pipelines/text_classification.py Co-authored-by: Sylvain Gugger <[email protected]> * Update tests/pipelines/test_pipelines_text_classification.py Co-authored-by: Lysandre Debut <[email protected]> * quality. Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Lysandre Debut <[email protected]> commit 3601aa8 Author: Stas Bekman <[email protected]> Date: Wed May 18 16:00:47 2022 -0700 [tests] fix copy-n-paste error (huggingface#17312) * [tests] fix copy-n-paste error * fix commit 1b20c97 Author: Yih-Dar <[email protected]> Date: Wed May 18 21:49:08 2022 +0200 Fix ci_url might be None (huggingface#17332) * fix * Update utils/notification_service.py Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: ydshieh <[email protected]> Co-authored-by: Lysandre Debut <[email protected]> commit 6aad387 Author: Yih-Dar <[email protected]> Date: Wed May 18 21:26:44 2022 +0200 fix (huggingface#17337) Co-authored-by: ydshieh <[email protected]> commit 1762ded Author: Zachary Mueller <[email protected]> Date: Wed May 18 14:17:40 2022 -0400 Fix metric calculation in examples and setup tests to run on multi-gpu for no_trainer scripts (huggingface#17331) * Fix length in no_trainer examples * Add setup and teardown * Use new accelerator config generator to automatically make tests able to run based on environment commit 6e195eb Author: Jader Martins <[email protected]> Date: Wed May 18 14:18:43 2022 -0300 docs for typical decoding (huggingface#17186) Co-authored-by: Jader Martins <[email protected]> commit 060fe61 Author: Yih-Dar <[email protected]> Date: Wed May 18 19:07:48 2022 +0200 Not send successful report (huggingface#17329) * send report only if there is any failure Co-authored-by: ydshieh <[email protected]> commit b3b9f99 Author: Yih-Dar <[email protected]> Date: Wed May 18 17:57:23 2022 +0200 Fix test_t5_decoder_model_past_large_inputs (huggingface#17320) Co-authored-by: ydshieh <[email protected]> commit 6da76b9 Author: Jingya HUANG <[email protected]> Date: Wed May 18 17:52:13 2022 +0200 Add onnx export cuda support (huggingface#17183) Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: lewtun <[email protected]> commit adc0ff2 Author: NielsRogge <[email protected]> Date: Wed May 18 17:47:18 2022 +0200 Add CvT (huggingface#17299) * Adding cvt files * Adding cvt files * changes in init file * Adding cvt files * changes in init file * Style fixes * Address comments from code review * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Format lists in docstring * Fix copies * Apply suggestion from code review Co-authored-by: AnugunjNaman <[email protected]> Co-authored-by: Ayushman Singh <[email protected]> Co-authored-by: Niels Rogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> commit 4710702 Author: Sylvain Gugger <[email protected]> Date: Wed May 18 10:46:40 2022 -0400 Fix style commit 5fdb54e Author: mraunak <[email protected]> Date: Wed May 18 10:39:02 2022 -0400 Add Information Gain Filtration algorithm (huggingface#16953) * Add information gain filtration algorithm * Complying with black requirements * Added author * Fixed import order * flake8 corrections Co-authored-by: Javier Turek <[email protected]> commit 91ede48 Author: Kamal Raj <[email protected]> Date: Wed May 18 19:59:53 2022 +0530 Fix typo (huggingface#17328) commit fe28eb9 Author: Yih-Dar <[email protected]> Date: Wed May 18 16:06:41 2022 +0200 remove (huggingface#17325) Co-authored-by: ydshieh <[email protected]> commit 2cb2ea3 Author: Nicolas Patry <[email protected]> Date: Wed May 18 16:06:24 2022 +0200 Accepting real pytorch device as arguments. (huggingface#17318) * Accepting real pytorch device as arguments. * is_torch_available. commit 1c9d1f4 Author: Nicolas Patry <[email protected]> Date: Wed May 18 15:46:12 2022 +0200 Updating the docs for `max_seq_len` in QA pipeline (huggingface#17316) commit 60ad734 Author: Patrick von Platen <[email protected]> Date: Wed May 18 15:08:56 2022 +0200 [T5] Fix init in TF and Flax for pretraining (huggingface#17294) * fix init * Apply suggestions from code review * fix * finish * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> commit 7ba1d4e Author: Joaq <[email protected]> Date: Wed May 18 09:23:47 2022 -0300 Add type hints for ProphetNet (Pytorch) (huggingface#17223) * added type hints to prophetnet * reformatted with black * fix bc black misformatted some parts * fix imports * fix imports * Update src/transformers/models/prophetnet/configuration_prophetnet.py Co-authored-by: Matt <[email protected]> * update OPTIONAL type hint and docstring Co-authored-by: Matt <[email protected]> commit d6b8e9c Author: Carl <[email protected]> Date: Wed May 18 01:07:43 2022 +0200 Add trajectory transformer (huggingface#17141) * Add trajectory transformer Fix model init Fix end of lines for .mdx files Add trajectory transformer model to toctree Add forward input docs Fix docs, remove prints, simplify prediction test Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Update docs, more descriptive comments Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> Update readme Small comment update and add conversion script Rebase and reformat Fix copies Fix rebase, remove duplicates Fix rebase, remove duplicates * Remove tapex * Remove tapex * Remove tapex commit c352640 Author: Patrick von Platen <[email protected]> Date: Wed May 18 00:34:31 2022 +0200 fix (huggingface#17310) commit d9050dc Author: Cesare Campagnano <[email protected]> Date: Tue May 17 23:44:37 2022 +0200 [LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing (huggingface#17112) * [LED] fixed global_attention_mask not passed for generation + docs clarification for gradient checkpointing * LED docs clarification Co-authored-by: Patrick von Platen <[email protected]> * [LED] gradient_checkpointing=True should be passed to TrainingArguments Co-authored-by: Patrick von Platen <[email protected]> * [LED] docs: remove wrong word Co-authored-by: Patrick von Platen <[email protected]> * [LED] docs fix typo Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> commit bad3583 Author: Jean Vancoppenolle <[email protected]> Date: Tue May 17 23:42:14 2022 +0200 Add support for pretraining recurring span selection to Splinter (huggingface#17247) * Add SplinterForSpanSelection for pre-training recurring span selection. * Formatting. * Rename SplinterForSpanSelection to SplinterForPreTraining. * Ensure repo consistency * Fixup changes * Address SplinterForPreTraining PR comments * Incorporate feedback and derive multiple question tokens per example. * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Jean Vancoppenole <[email protected]> Co-authored-by: Tobias Günther <[email protected]> Co-authored-by: Tobias Günther <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> commit 0511305 Author: Yih-Dar <[email protected]> Date: Tue May 17 18:56:58 2022 +0200 Add PR author in CI report + merged by info (huggingface#17298) * Add author info to CI report * Add merged by info * update Co-authored-by: ydshieh <[email protected]> commit 032d63b Author: Sylvain Gugger <[email protected]> Date: Tue May 17 12:56:24 2022 -0400 Fix dummy creation script (huggingface#17304) commit 986dd5c Author: Sylvain Gugger <[email protected]> Date: Tue May 17 12:50:14 2022 -0400 Fix style commit 38ddab1 Author: Karim Foda <[email protected]> Date: Tue May 17 09:32:12 2022 -0700 Doctest longformer (huggingface#16441) * Add initial doctring changes * make fixup * Add TF doc changes * fix seq classifier output * fix quality errors * t * swithc head to random init * Fix expected outputs * Update src/transformers/models/longformer/modeling_longformer.py Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: Yih-Dar <[email protected]> commit 10704e1 Author: Patrick von Platen <[email protected]> Date: Tue May 17 18:20:36 2022 +0200 [Test] Fix W2V-Conformer integration test (huggingface#17303) * [Test] Fix W2V-Conformer integration test * correct w2v2 * up commit 28a0811 Author: regisss <[email protected]> Date: Tue May 17 17:58:14 2022 +0200 Improve mismatched sizes management when loading a pretrained model (huggingface#17257) - Add --ignore_mismatched_sizes argument to classification examples - Expand the error message when loading a model whose head dimensions are different from expected dimensions commit 1f13ba8 Author: Patrick von Platen <[email protected]> Date: Tue May 17 15:48:23 2022 +0200 correct opt (huggingface#17301) commit 349f1c8 Author: Matt <[email protected]> Date: Tue May 17 14:36:23 2022 +0100 Rewrite TensorFlow train_step and test_step (huggingface#17057) * Initial commit * Better label renaming * Remove breakpoint before pushing (this is your job) * Test a lot more in the Keras fit() test * make fixup * Clarify the case where we flatten y dicts into tensors * Clarify the case where we flatten y dicts into tensors * Extract label name remapping to a method commit 651e48e Author: Matt <[email protected]> Date: Tue May 17 14:14:17 2022 +0100 Fix tests of mixed precision now that experimental is deprecated (huggingface#17300) * Fix tests of mixed precision now that experimental is deprecated * Fix mixed precision in training_args_tf.py too commit 6d21142 Author: SaulLu <[email protected]> Date: Tue May 17 14:33:13 2022 +0200 fix retribert's `test_torch_encode_plus_sent_to_model` (huggingface#17231)

Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: lewtun <[email protected]>

JingyaHuang added 3 commits May 11, 2022 12:19

Add CUDA support

126d7e3

Raise if TF

296d590

Add test for PyTorch export on CUDA

756e5b6

JingyaHuang requested review from lewtun and michaelbenayoun May 11, 2022 15:32

JingyaHuang changed the title ~~Jingya/onnx export cuda~~ onnx export cuda support May 11, 2022

JingyaHuang changed the title ~~onnx export cuda support~~ Add onnx export cuda support May 11, 2022

Fix NameError

f1c9dd4

michaelbenayoun approved these changes May 12, 2022

View reviewed changes

LysandreJik approved these changes May 12, 2022

View reviewed changes

src/transformers/onnx/convert.py Outdated Show resolved Hide resolved

src/transformers/onnx/convert.py Outdated Show resolved Hide resolved

JingyaHuang and others added 2 commits May 13, 2022 00:01

Fix typo

6a96f3a

Co-authored-by: Lysandre Debut <[email protected]>

Fix typo

3c1813c

Co-authored-by: Lysandre Debut <[email protected]>

lewtun reviewed May 13, 2022

View reviewed changes

src/transformers/onnx/convert.py Outdated Show resolved Hide resolved

src/transformers/onnx/convert.py Outdated Show resolved Hide resolved

src/transformers/onnx/convert.py Outdated Show resolved Hide resolved

JingyaHuang and others added 5 commits May 13, 2022 15:55

Update src/transformers/onnx/convert.py

f105ad2

Co-authored-by: lewtun <[email protected]>

Change arg from cuda to device

898d827

Resolve conflicts

6906db3

fix style

796c67d

fix test

dde6481

lewtun reviewed May 16, 2022

View reviewed changes

Improve docstring

904ab89

JingyaHuang added 2 commits May 16, 2022 23:07

Merge branch 'main' into jingya/onnx-export-cuda

564cc11

Fix style

8dbf073

Merge branch 'main' into jingya/onnx-export-cuda

ff34c30

JingyaHuang added 2 commits May 17, 2022 15:41

Fix bug in Big Bird

3a4833d

Make fix-copies

4413ee0

JingyaHuang mentioned this pull request May 18, 2022

Patch ORTTrainer's compatibility with DeepSpeed huggingface/optimum#148

Merged

9 tasks

lewtun reviewed May 18, 2022

View reviewed changes

JingyaHuang added 3 commits May 18, 2022 16:37

Revert notebook style

d4137b4

Merge branch 'main' into jingya/onnx-export-cuda

ee48136

revert notebooks

fc561fa

lewtun approved these changes May 18, 2022

View reviewed changes

lewtun merged commit 6da76b9 into huggingface:main May 18, 2022

JingyaHuang deleted the jingya/onnx-export-cuda branch May 24, 2022 18:55

elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022

Add onnx export cuda support (huggingface#17183)

06ae39b

Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: lewtun <[email protected]>

Add onnx export cuda support #17183

Add onnx export cuda support #17183

Uh oh!

Conversation

JingyaHuang commented May 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Context

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented May 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelbenayoun left a comment

Choose a reason for hiding this comment

Uh oh!

JingyaHuang commented May 12, 2022

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JingyaHuang commented May 16, 2022

Uh oh!

lewtun commented May 17, 2022

Uh oh!

JingyaHuang commented May 17, 2022

Uh oh!

lewtun commented May 18, 2022

Uh oh!

lewtun May 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingyaHuang May 18, 2022

Choose a reason for hiding this comment

Uh oh!

JingyaHuang May 18, 2022

Choose a reason for hiding this comment

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JingyaHuang commented May 11, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented May 11, 2022 •

edited

Loading

lewtun May 18, 2022 •

edited

Loading