Skip to content

onnxRuntimeGenAI.QNN Long-Term Inference Crash and Memory Anomalies Issue Report #1830

@suki-lqh

Description

@suki-lqh

Bug Description and Reproduction

Describe the Bug

When using onnxRuntimeGenAI.QNN for long-term inference, multiple machines crash after 1-2 hours of continuous inference. Even if they survive this duration, issues such as garbled responses or no responses will occur.


Application: dynabook Assistant.exe  
CoreCLR Version: 8.0.1925.36514  
.NET Version: 8.0.19  
Description: The process was terminated due to an unhandled exception.  
Exception Info: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.  
Stack:
   at Microsoft.ML.OnnxRuntimeGenAI.NativeMethods.OgaTokenizerEncode(IntPtr, Byte[], IntPtr)
   at Microsoft.ML.OnnxRuntimeGenAI.NativeMethods.OgaTokenizerEncode(IntPtr, Byte[], IntPtr)
   at Microsoft.ML.OnnxRuntimeGenAI.Tokenizer.Encode(System.String)
   at Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIChatClient+<GetStreamingResponseAsync>d__13.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon ByRef)
   at Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIChatClient+<GetStreamingResponseAsync>d__13.System.Collections.Generic.IAsyncEnumerator<Microsoft.Extensions.AI.ChatResponseUpdate>.MoveNextAsync()
   at dynabookSmartHelp.LLMProvider.OnnxRuntimeGenAI+<>c__DisplayClass19_0+<<InferStreaming>b__1>d.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon ByRef)
   at dynabookSmartHelp.LLMProvider.OnnxRuntimeGenAI+<>c__DisplayClass19_0.<InferStreaming>b__1()
   at System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()

To Reproduce

Steps to reproduce the issue:

  1. Initialize the model
  2. Repeat the inference process (Start inference → Wait for inference to end → Start inference)

Expected Behavior

No application crashes.

Desktop (please complete the following information)

  • OS: Windows 11 Home 25H2 26200.6588
  • OnnxRuntimeGenAI.QNN: 0.10.0
  • NPU Driver: 30.0.140.1000/30.0.145.1000

Additional Context

Add any other relevant information about the issue here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions