ORT 1.23.1 cherrypick 2 #26182

adrianlizarraga · 2025-09-26T21:19:13Z

Description

Adds the following commits to the rel-1.23.1 branch for ORT 1.23.1:

add session_id_ to LogEvaluationStart/Stop, LogSessionCreationStart
- main merge date: July 31, 1:05am
- pr: add session_id_ to LogEvaluationStart/Stop, LogSessionCreationStart #25590
- commit: e753643
[build] fix WebAssembly build on macOS/arm64
- main merge date: Aug 5, 8:07am
- pr: [build] fix WebAssembly build on macOS/arm64 #25653
- commit: 53f152b
[CPU] MoE Kernel ([CPU] MoE Kernel #25958)
- main merge date: Sept 10, 4:54pm
- pr: [CPU] MoE Kernel #25958
- commit: 930e640
[CPU] Block-wise QMoE kernel for CPU
- main merge date: Sept 15, 8:32am
- pr: [CPU] Block-wise QMoE kernel for CPU #26009
- commit: 5d17734
[C#] Implement missing APIs
- main merge date: Sept 24, 10:50am
- pr: [C#] Implement missing APIs #26101
- commit: 35dcab5
Regenerate test model with ONNX IR < 12
- main merge date: Sept 24, 2:50pm
- pr: Regenerate test model with ONNX IR < 12 #26149
- commit: 88f2652
[CPU] Fix compilation errors because of unused variables
- main merge date: Sept 25, 1:21pm
- pr: [CPU] Fix compilation errors because of unused variables #26147
- commit: 42fcd71
[EP ABI] Check if nodes specified in GetCapability() have already been assigned
- main merge date: Sept 26, 1:24am
- pr: [EP ABI] Check if nodes specified in GetCapability() have already been assigned #26156
- commit: 67d3ba0
[QNN EP] Add dynamic option to set HTP performance mode
- main merge date: Sept 26, 11:55am
- pr: [QNN EP] Add dynamic option to set HTP performance mode #26135
- commit: 6cc40fd

…25590) ### Description  use session id to track them with LogSessionCreation if we call Run in different threads, we could differentiate them with thread id given Run is not async ### Motivation and Context  --------- Co-authored-by: hualxie <[email protected]>

### Description fix WebAssembly build on macOS/arm64 by disable appending "-Donnxruntime_USE_KLEIDIAI=ON" to the cmake_args KleidiAI should not be enabled for WebAssembly build.

CPU MoE Kernel ``` name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 1, seq_len: 16, max_diff: 2.682209014892578e-07 .name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 1, seq_len: 32, max_diff: 2.980232238769531e-07 .name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 2, seq_len: 16, max_diff: 2.980232238769531e-07 .name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 2, seq_len: 32, max_diff: 4.172325134277344e-07 .MoE CPU kernel time: 15.721677541732786 ms . ---------------------------------------------------------------------- Ran 5 tests in 30.217s ```

This PR adds block-wise quant kernel for QMoE CPU

This pull request adds new APIs and updates existing ones to improve memory and device information handling in the ONNX Runtime C# bindings. The most significant changes introduce methods for fetching memory info and device info for session inputs/outputs, and add support for shared allocators and synchronization streams. There are also several updates and renamings for LoraAdapter delegates and related APIs. ### Memory and Device Info APIs * Added `GetMemoryInfosForInputs`, `GetMemoryInfosForOutputs`, and `GetEpDeviceForInputs` methods to `InferenceSession.shared.cs` to fetch memory info and device info for session inputs/outputs. These methods utilize new native delegates for retrieving memory and device information. * Introduced native delegates in `NativeMethods.shared.cs` for `OrtSessionGetMemoryInfoForInputs`, `OrtSessionGetMemoryInfoForOutputs`, and `OrtSessionGetEpDeviceForInputs`, and wired them up in the static constructor. [[1]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R530-R532) [[2]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R1312-R1335) ### Shared Allocator and Synchronization Stream Support * Added delegates and static fields for creating, getting, and releasing shared allocators, as well as for creating and managing synchronization streams (`OrtCreateSharedAllocator`, `OrtGetSharedAllocator`, `OrtReleaseSharedAllocator`, `OrtCreateSyncStreamForEpDevice`, `OrtSyncStream_GetHandle`, `OrtReleaseSyncStream`). * Added delegate for copying tensors (`OrtCopyTensors`). ### LoraAdapter API Updates * Renamed LoraAdapter-related delegates to use the `Ort` prefix (`OrtCreateLoraAdapter`, `OrtCreateLoraAdapterFromArray`, `OrtReleaseLoraAdapter`) and updated their usage throughout the codebase. [[1]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73L699-R710) [[2]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73L1561-R1672) [[3]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73L1578-R1695) ### MemoryInfo Enhancements * Added new delegates for creating memory info with more parameters (`OrtCreateMemoryInfoV2`), and for querying device memory type and vendor ID (`OrtMemoryInfoGetDeviceMemType`, `OrtMemoryInfoGetVendorId`). [[1]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R594-R596) [[2]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R1804-R1817) [[3]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R1866-R1877) ### Minor API Documentation Update * Clarified the lifetime of allocators in the documentation, noting they can be explicitly unregistered.### Description  ### Motivation and Context

### Description - Regenerates the `input_propagate_to_output.onnx` model used in [this unit test](https://github.com/microsoft/onnxruntime/blob/35dcab5088118117acc6086c9b6dd6dd92c7060f/onnxruntime/test/shared_lib/test_inference.cc#L497-L506) so that it uses an ONNX IR version compatible with ONNX 1.18.0 (i.e., IR version < 12). - Adds script `input_propagate_to_output.py` that can be used to regenerate the `input_propagate_to_output.onnx` model. - Embed missing weight values that are needed to run the existing `test_dangling_input_segment_ids.py` script. ### Motivation and Context The main branch is using ONNX 1.19. However, this unit test also needs to pass in the `rel-1.23.1` branch, which is still using ONNX 1.18.0. So, by downgrading the model's IR version, the unit test can run in both branches. See original PR that added the test models: #26021

This PR fixes few unused variables

…n assigned (#26156) ### Description Fixes segfault in `PluginExecutionProvider::GetCapability()` when the underlying `OrtEp` tries to claim nodes that have already been assigned to another EP. ### Motivation and Context Should log a warning (instead of crashing or throwing an exception) when a plugin EP tries to claim a node that is already assigned to another EP. --------- Co-authored-by: Edward Chen <[email protected]>

### Description Add a new EP Dynamic option to set HTP performance mode after session creation. --------- Co-authored-by: quic-ashwshan <[email protected]>

xieofxie and others added 10 commits September 26, 2025 14:11

[build] fix WebAssembly build on macOS/arm64 (#25653)

04f0fff

### Description fix WebAssembly build on macOS/arm64 by disable appending "-Donnxruntime_USE_KLEIDIAI=ON" to the cmake_args KleidiAI should not be enabled for WebAssembly build.

[CPU] Block-wise QMoE kernel for CPU (#26009)

919b894

This PR adds block-wise quant kernel for QMoE CPU

[CPU] Fix compilation errors because of unused variables (#26147)

5b5bf39

This PR fixes few unused variables

[QNN EP] Add dynamic option to set HTP performance mode (#26135)

9062599

### Description Add a new EP Dynamic option to set HTP performance mode after session creation. --------- Co-authored-by: quic-ashwshan <[email protected]>

Re-enable inference tests that test the I/O memory info C APIs

41b4da6

adrianlizarraga requested review from HectorSVC, apsonawane, snnn and yuslepukhin September 26, 2025 21:21

yuslepukhin approved these changes Sep 26, 2025

View reviewed changes

snnn approved these changes Sep 26, 2025

View reviewed changes

HectorSVC approved these changes Sep 26, 2025

View reviewed changes

apsonawane approved these changes Sep 26, 2025

View reviewed changes

snnn merged commit d9b2048 into rel-1.23.1 Sep 27, 2025
74 of 75 checks passed

snnn deleted the adrianl/rel-1.23.1-cherrypick-2 branch September 27, 2025 03:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ORT 1.23.1 cherrypick 2 #26182

ORT 1.23.1 cherrypick 2 #26182

Uh oh!

adrianlizarraga commented Sep 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ORT 1.23.1 cherrypick 2 #26182

ORT 1.23.1 cherrypick 2 #26182

Uh oh!

Conversation

adrianlizarraga commented Sep 26, 2025

Description

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants