onnxRuntimeGenAI.QNN Long-Term Inference Crash and Memory Anomalies Issue Report

# Bug Description and Reproduction
## Describe the Bug
When using onnxRuntimeGenAI.QNN for long-term inference, multiple machines crash after 1-2 hours of continuous inference. Even if they survive this duration, issues such as garbled responses or no responses will occur.

```

Application: dynabook Assistant.exe  
CoreCLR Version: 8.0.1925.36514  
.NET Version: 8.0.19  
Description: The process was terminated due to an unhandled exception.  
Exception Info: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.  
Stack:
   at Microsoft.ML.OnnxRuntimeGenAI.NativeMethods.OgaTokenizerEncode(IntPtr, Byte[], IntPtr)
   at Microsoft.ML.OnnxRuntimeGenAI.NativeMethods.OgaTokenizerEncode(IntPtr, Byte[], IntPtr)
   at Microsoft.ML.OnnxRuntimeGenAI.Tokenizer.Encode(System.String)
   at Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIChatClient+<GetStreamingResponseAsync>d__13.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon ByRef)
   at Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIChatClient+<GetStreamingResponseAsync>d__13.System.Collections.Generic.IAsyncEnumerator<Microsoft.Extensions.AI.ChatResponseUpdate>.MoveNextAsync()
   at dynabookSmartHelp.LLMProvider.OnnxRuntimeGenAI+<>c__DisplayClass19_0+<<InferStreaming>b__1>d.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon ByRef)
   at dynabookSmartHelp.LLMProvider.OnnxRuntimeGenAI+<>c__DisplayClass19_0.<InferStreaming>b__1()
   at System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
```


## To Reproduce
Steps to reproduce the issue:
1. Initialize the model
2. Repeat the inference process (Start inference → Wait for inference to end → Start inference)


## Expected Behavior
No application crashes.


## Desktop (please complete the following information)
- OS: Windows 11 Home 25H2 26200.6588
- OnnxRuntimeGenAI.QNN: 0.10.0
- NPU Driver: 30.0.140.1000/30.0.145.1000


## Additional Context
Add any other relevant information about the issue here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

onnxRuntimeGenAI.QNN Long-Term Inference Crash and Memory Anomalies Issue Report #1830

Bug Description and Reproduction

Describe the Bug

To Reproduce

Expected Behavior

Desktop (please complete the following information)

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

onnxRuntimeGenAI.QNN Long-Term Inference Crash and Memory Anomalies Issue Report #1830

Description

Bug Description and Reproduction

Describe the Bug

To Reproduce

Expected Behavior

Desktop (please complete the following information)

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions