Skip to content

Conversation

ke1337
Copy link
Contributor

@ke1337 ke1337 commented Mar 5, 2020

Description: This change fixes #3129.
Motivation and Context

  • When running onnxruntime as dll on Windows, CUDA does some internal cleanups when process exits. After this, any call to CUDA would cause crash. Delayload makes thread_local destructor to happen after CUDA cleanup, thus the crash.

@ke1337 ke1337 requested a review from a team as a code owner March 5, 2020 20:06
@ke1337 ke1337 closed this Mar 5, 2020
This change fixes #3129. When running onnxruntime as dll on Windows, CUDA does some internal cleanups when process exits. After this, any call to CUDA would cause crash. Delayload makes thread_local destructor to happen after CUDA cleanup, thus the crash.
@ke1337 ke1337 reopened this Mar 5, 2020
@yufenglee
Copy link
Member

Do we also need remove the delay load in C#:

private static string[] cudaDelayLoadedLibs = { "cublas64_10.dll", "cudnn64_7.dll", "curand64_10.dll" };

@ke1337 ke1337 merged commit ade4fa1 into master Mar 5, 2020
@ke1337 ke1337 deleted the kedeng/bug3129 branch March 5, 2020 22:40
yufenglee pushed a commit that referenced this pull request Mar 6, 2020
This change fixes #3129. When running onnxruntime as dll on Windows, CUDA does some internal cleanups when process exits. After this, any call to CUDA would cause crash. Delayload makes thread_local destructor to happen after CUDA cleanup, thus the crash.
yufenglee added a commit that referenced this pull request Mar 7, 2020
* Publish release symbols (#3152)

* Publish release symbols

* Publish symbols if IsReleaseBuild

* Disable delayload for cuda dlls (#3147)

This change fixes #3129. When running onnxruntime as dll on Windows, CUDA does some internal cleanups when process exits. After this, any call to CUDA would cause crash. Delayload makes thread_local destructor to happen after CUDA cleanup, thus the crash.

* Update Gelu Fusion to support new graph pattern from PyTorch 1.4 (#3148)

* update GeluFusion to support pattern from PyTorch 1.4; 
* Fix a bug that missing the check of an edge between mul2 and root.
* update script to fuse gelu from PyTorch 1.4
* Add test for python optimizer

Co-authored-by: Tiago Koji Castro Shibata <[email protected]>
Co-authored-by: KeDengMS <[email protected]>
Co-authored-by: Tianlei Wu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cublas64_100.dll got exception after return from main

3 participants