-
Notifications
You must be signed in to change notification settings - Fork 229
Open
Description
Bug Description and Reproduction
Describe the Bug
When using onnxRuntimeGenAI.QNN for long-term inference, multiple machines crash after 1-2 hours of continuous inference. Even if they survive this duration, issues such as garbled responses or no responses will occur.
Application: dynabook Assistant.exe
CoreCLR Version: 8.0.1925.36514
.NET Version: 8.0.19
Description: The process was terminated due to an unhandled exception.
Exception Info: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Stack:
at Microsoft.ML.OnnxRuntimeGenAI.NativeMethods.OgaTokenizerEncode(IntPtr, Byte[], IntPtr)
at Microsoft.ML.OnnxRuntimeGenAI.NativeMethods.OgaTokenizerEncode(IntPtr, Byte[], IntPtr)
at Microsoft.ML.OnnxRuntimeGenAI.Tokenizer.Encode(System.String)
at Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIChatClient+<GetStreamingResponseAsync>d__13.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon ByRef)
at Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIChatClient+<GetStreamingResponseAsync>d__13.System.Collections.Generic.IAsyncEnumerator<Microsoft.Extensions.AI.ChatResponseUpdate>.MoveNextAsync()
at dynabookSmartHelp.LLMProvider.OnnxRuntimeGenAI+<>c__DisplayClass19_0+<<InferStreaming>b__1>d.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon ByRef)
at dynabookSmartHelp.LLMProvider.OnnxRuntimeGenAI+<>c__DisplayClass19_0.<InferStreaming>b__1()
at System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InnerInvoke()
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
To Reproduce
Steps to reproduce the issue:
- Initialize the model
- Repeat the inference process (Start inference → Wait for inference to end → Start inference)
Expected Behavior
No application crashes.
Desktop (please complete the following information)
- OS: Windows 11 Home 25H2 26200.6588
- OnnxRuntimeGenAI.QNN: 0.10.0
- NPU Driver: 30.0.140.1000/30.0.145.1000
Additional Context
Add any other relevant information about the issue here.
Metadata
Metadata
Assignees
Labels
No labels