-
Notifications
You must be signed in to change notification settings - Fork 3.5k
ORT 1.23.1 cherrypick 2 #26182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
ORT 1.23.1 cherrypick 2 #26182
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…25590) ### Description <!-- Describe your changes. --> use session id to track them with LogSessionCreation if we call Run in different threads, we could differentiate them with thread id given Run is not async ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: hualxie <[email protected]>
### Description fix WebAssembly build on macOS/arm64 by disable appending "-Donnxruntime_USE_KLEIDIAI=ON" to the cmake_args KleidiAI should not be enabled for WebAssembly build.
CPU MoE Kernel ``` name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 1, seq_len: 16, max_diff: 2.682209014892578e-07 .name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 1, seq_len: 32, max_diff: 2.980232238769531e-07 .name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 2, seq_len: 16, max_diff: 2.980232238769531e-07 .name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 2, seq_len: 32, max_diff: 4.172325134277344e-07 .MoE CPU kernel time: 15.721677541732786 ms . ---------------------------------------------------------------------- Ran 5 tests in 30.217s ```
This PR adds block-wise quant kernel for QMoE CPU
This pull request adds new APIs and updates existing ones to improve memory and device information handling in the ONNX Runtime C# bindings. The most significant changes introduce methods for fetching memory info and device info for session inputs/outputs, and add support for shared allocators and synchronization streams. There are also several updates and renamings for LoraAdapter delegates and related APIs. ### Memory and Device Info APIs * Added `GetMemoryInfosForInputs`, `GetMemoryInfosForOutputs`, and `GetEpDeviceForInputs` methods to `InferenceSession.shared.cs` to fetch memory info and device info for session inputs/outputs. These methods utilize new native delegates for retrieving memory and device information. * Introduced native delegates in `NativeMethods.shared.cs` for `OrtSessionGetMemoryInfoForInputs`, `OrtSessionGetMemoryInfoForOutputs`, and `OrtSessionGetEpDeviceForInputs`, and wired them up in the static constructor. [[1]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R530-R532) [[2]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R1312-R1335) ### Shared Allocator and Synchronization Stream Support * Added delegates and static fields for creating, getting, and releasing shared allocators, as well as for creating and managing synchronization streams (`OrtCreateSharedAllocator`, `OrtGetSharedAllocator`, `OrtReleaseSharedAllocator`, `OrtCreateSyncStreamForEpDevice`, `OrtSyncStream_GetHandle`, `OrtReleaseSyncStream`). * Added delegate for copying tensors (`OrtCopyTensors`). ### LoraAdapter API Updates * Renamed LoraAdapter-related delegates to use the `Ort` prefix (`OrtCreateLoraAdapter`, `OrtCreateLoraAdapterFromArray`, `OrtReleaseLoraAdapter`) and updated their usage throughout the codebase. [[1]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73L699-R710) [[2]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73L1561-R1672) [[3]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73L1578-R1695) ### MemoryInfo Enhancements * Added new delegates for creating memory info with more parameters (`OrtCreateMemoryInfoV2`), and for querying device memory type and vendor ID (`OrtMemoryInfoGetDeviceMemType`, `OrtMemoryInfoGetVendorId`). [[1]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R594-R596) [[2]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R1804-R1817) [[3]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R1866-R1877) ### Minor API Documentation Update * Clarified the lifetime of allocators in the documentation, noting they can be explicitly unregistered.### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description - Regenerates the `input_propagate_to_output.onnx` model used in [this unit test](https://github.com/microsoft/onnxruntime/blob/35dcab5088118117acc6086c9b6dd6dd92c7060f/onnxruntime/test/shared_lib/test_inference.cc#L497-L506) so that it uses an ONNX IR version compatible with ONNX 1.18.0 (i.e., IR version < 12). - Adds script `input_propagate_to_output.py` that can be used to regenerate the `input_propagate_to_output.onnx` model. - Embed missing weight values that are needed to run the existing `test_dangling_input_segment_ids.py` script. ### Motivation and Context The main branch is using ONNX 1.19. However, this unit test also needs to pass in the `rel-1.23.1` branch, which is still using ONNX 1.18.0. So, by downgrading the model's IR version, the unit test can run in both branches. See original PR that added the test models: #26021
This PR fixes few unused variables
…n assigned (#26156) ### Description Fixes segfault in `PluginExecutionProvider::GetCapability()` when the underlying `OrtEp` tries to claim nodes that have already been assigned to another EP. ### Motivation and Context Should log a warning (instead of crashing or throwing an exception) when a plugin EP tries to claim a node that is already assigned to another EP. --------- Co-authored-by: Edward Chen <[email protected]>
### Description Add a new EP Dynamic option to set HTP performance mode after session creation. --------- Co-authored-by: quic-ashwshan <[email protected]>
yuslepukhin
approved these changes
Sep 26, 2025
snnn
approved these changes
Sep 26, 2025
HectorSVC
approved these changes
Sep 26, 2025
apsonawane
approved these changes
Sep 26, 2025
This was referenced Sep 27, 2025
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Adds the following commits to the
rel-1.23.1
branch for ORT 1.23.1: