Onnxrt 1.22 release tag merge #125

TedThemistokleous · 2025-06-04T15:09:25Z

Description

Synchronize release Tag of Onnxruntime 1.22 with ROCm 7.0 internal_testing branch

Motivation and Context

Required so that we're using the correct official Onnxruntime 1.22 release tag in our testing and builds before ROCm 7.0 release

…ps (microsoft#24090) ### Description  Support shape inference for QLinearAdd and QLinearMul ops which were missing in symbolic_shape_infer.py ### Motivation and Context  This change is required to enable shape inference for models with "QLinearAdd" ops which are defined in com.microsoft domain and the shapes of which cannot be inferred using onnx shape_inference alone. Fixes issue microsoft#24028 --------- Signed-off-by: Praveen G <[email protected]>

…ine (microsoft#23580) ### Description Follow-up to microsoft#23551 Adds the BrowserStack testing stage for Android to the NuGet packaging pipeline. This test tests that the NuGet package produced will be imported and work correctly on an Android device [Pipeline run that shows what a failing unit test would look like](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=670961&view=results) --------- Co-authored-by: Edward Chen <[email protected]>

### Description Add fp16 support to sparse attention ### Motivation and Context Generalize models for CPU and GPU

### Description This PR refactors the mac CI pipeline: - Use composite action and reusable workflow to put together duplicated code - separate each EP

### Description Create a separate template overloads to address Windows Debug build warning 'unreachable code'.

@qjia7

…oft#24115) ### Description This PR introduced a new WebGPU EP option `preserveDevice`. Before this change, a WebGPU device will be destroyed when no inference session uses it. The destroy of a WebGPU device will cleanup both buffer cache and shader cache. After this option is introduced, when the option is ON (default value is OFF), the device will no longer be destroyed and will be always keep alive. This is helpful in 2 scenarios: - A server that will be always on - unittest so that bugs of incorrect shader cache may be detected. (thanks to @qjia7 for the suggestion)

…oft#24014) ### Description  This gives a way for webapp developers to customize the bundler behavior regarding whether to bundle the wasm. To avoid treating ort-wasm-threaded-simd.jsep.mjs and ort-wasm-threaded-simd.jsep.wasm as dependencies during the process of bundler build, use import condition `onnxruntime-web-use-extern-wasm`. For webpack: ``` module.exports = { //... resolve: { conditionNames: ['onnxruntime-web-use-extern-wasm', 'import', 'module'], }, }; ``` For esbuild: ``` await esbuild.build({ //... conditions: ['onnxruntime-web-use-extern-wasm', 'import', 'module'], }) ``` For rollup: ``` import { nodeResolve } from '@rollup/plugin-node-resolve'; export default { //... plugins: [nodeResolve({ exportConditions: ['onnxruntime-web-use-extern-wasm', 'import', 'module', 'development|production'] })] }; ``` ### Motivation and Context  - microsoft#24009

…oft#23937) ### Description Add API for accessing metadata of a model's input/output. Currently, The implementation is only applied to web assembly backend and nodejs binding. For webgl, there is so far no plan to implement this API; for react-native, the implementation will be done later and is not included in this PR. #### Example usage: ```js const mySession = await ort.InferenceSession.create( ... ); console.log(`there are ${mySession.inputMetadata.length} inputs:`); for (let i = 0; i < mySession.inputMetadata.length; i++) { let info; if (mySession.inputMetadata[i].isTensor) { info = `tensor: ${mySession.inputMetadata[i].type}, shape: ${mySession.inputMetadata[i].shape}`; } else { info = `non-tensor`; } console.log(`input ${i}: ${mySession.inputMetadata[i].name}: ${info}`); } ``` possible output: ``` there are 1 inputs: input 0: input: tensor: float32, shape: [batch, 3, 224, 224] ``` Resolves: - microsoft#22682 - microsoft#22949

### Description add cache "onnxnodetests" for node tests This fixes the random download network error for onnx node tests data. ### Motivation and Context

### Description Add Native Matmul (`MatMulNaive`, `MatMulPacked` and `MatMulPackedVec4` ) ### Motivation and Context

### Description Big model pipeline are still using cuda 11.8. This update the pipeline to use cuda 12.x. ### Motivation and Context

…icrosoft#24151) ### Description Show proper error message when fp16 model is used for Beam Search in CPU. Before: ``` 2025-02-15 20:15:02.999160115 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running BeamSearch node. Name:'beam_search' Status Message: bad_function_call ``` After: ``` onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running BeamSearch node. Name:'beam_search' Status Message: onnxruntime/onnxruntime/contrib_ops/cpu/transformers/beam_search.cc:309 virtual onnxruntime::common::Status onnxruntime::contrib::transformers::BeamSearch::Compute(onnxruntime::OpKernelContext*) const BeamSearch does not support float16 model on CPU execution provider. Use float32 model or CUDA execution provider instead. ``` ### Motivation and Context microsoft#23728

### Description As titled. ### Motivation and Context We have the last MatMul in phi-4-mini onnx which is b_shape = {3072, 200064} packed_b_size = MlasGemmPackBSize(N, K); it is `3072*200064*sizeof(float)=2458386432` This is larger than 2,147,483,647, it is out of the int boundary on a 32-bit system. Then len is negative. So we change the type to size_t, and the model can be loaded successfully after the change.

### Description Limit the Pipeline ability to build cuda 11. However, refernce to CUDA 11 is not complety removed in this PR. Will keep thme incase we decided to support both cuda 13 and cuda 12 in the future. ### Motivation and Context

### Description Move the x64 part of "Linux CPU CI pipeline" to Github Actions

…rosoft#24136) Move the allocator data member declaration before the `Ort::Value` container data members that might use the allocator so that the `Ort::Value` containers will be destroyed first. `custom_allocator_` may be used as the allocator for the `Ort::Value`s in `test_inputs_` and `outputs_`. The allocator shouldn't be destroyed before `Ort::Value`s allocated with it are freed.

### Description Fix layout transformer for FusedConv. The current layout transformer will transform `FusedConv` (kMSDomain) into `FusedConv` (kMSInternalNHWCDomain) if the EP wants channels_last. However, kMSInternalNHWCDomain uses OpType `Conv` for both Conv and FusedConv, so `FusedConv` (kMSInternalNHWCDomain) is invalid (unregistered op). This PR fixes this and allows layout transformer change `FusedConv` (kMSDomain) into `Conv` (kMSInternalNHWCDomain). ### Motivation and Context

…idail is disabled in MacOS and iOS packaging stage due to (microsoft#24152) (microsoft#24153) NuGet_Packaging_CPU is broken due to similar issue from microsoft#23923 ### Description Migrate [Zip-Nuget Package Pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=940&_a=summary) to 1ES ### Motivation and Context  ### Check list - [x] Issue with onnxruntime-Win-CPU-2022 - [x] [Spot Bug](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=697830&view=logs&j=6c6a898f-bbbb-5c72-8695-82b606149fa2&t=433f102b-5ed3-5fed-87a0-6107744ce9b1&l=81)

### Description Update the min supported GCC version to 11.1. ### Motivation and Context In order to utilize new CPU instructions, we need to use new compilers. For example, our MLAS code needs bfloat16 support for arm, which requires GCC version >=10. And some other code requires GCC version >=11.1. Also, our CI pipelines only tests the code with GCC 11,12 and 14. Therefore this PR increase the min GCC version to 11.1. Will update it to 12 once we deprecate CUDA 11 pipelines

) --use_vcpkg option seems to be causing problems for --arm64ec python packages (onnxruntime-qnn) session creation crashes for packages built with --use_vcpkg. the released onnxruntime-qnn 1.21.0 python wheel for x64 (arm64ec) has this issue. removing --use_vcpkg while the issue is debugged in parallel. we plan to release a 1.21.1 onnxruntime-qnn x64 python wheel without --use_vcpkg to address the crash. microsoft#24082

Increases operator GEMM for WebGPU ep. --------- Co-authored-by: Xiaofei Han <[email protected]> Co-authored-by: Yulong Wang <[email protected]>

### Description There are slightly mismatch for the build flags for Web build pipeline when using vcpkg. A [fix](microsoft#24012) is on the way but for now we need to disable vcpkg for the next patch release. ### Motivation and Context

### Description - remove x86_64/Debug build in the matrix to reduce the amount of jobs - set max-parallel to 1 to avoid big backlogs (single PR will take longer but less traffic in the pipeine)

### Description currently it is triggered on every branch.

### Description upgrade QNN to latest version 2.32.0.250228

Fixes microsoft#24070 by explicitly restricting single-threaded, sequential execution in the case where `reduction=none && hasDuplicates`.

…4191)

…24194) This is a workaround for a build error. See microsoft#24152.

### Description  ### Motivation and Context

### Description  Add infrastructure to enable auto EP selection. Device discovery for CPU/GPU/NPU on Windows. Supports internal (CPU/DML/WebGPU) and provider bridge (CUDA) EPs currently. Infrastructure will be used with plugin EPs next. Selection policy implementation will be added next, so in the interim there's a temporary function with manually specified selection so unit tests can cover the end-to-end. ### Motivation and Context  --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Adrian Lizarraga <[email protected]>

) ### Description WebNN doesn't support AveragePool with count_include_pad == 1. ### Motivation and Context Support it by adding a pad and calling averagePool2D with pads as 0's.

### Description  Fix some issues. Use adapter number instead of bus number. Bus number doesn't work as expected on VMs. Disable for XBOX build. Needs different handling for adapter lookup. Use adapter number as device_id when creating DML OrtEpDevice. Fix some issues with the metadata. ### Motivation and Context

### Description Cherry pick the following into [rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0) - (microsoft#24487) - (microsoft#24466) - (microsoft#24493) - (microsoft#24484) - (microsoft#24494) - (microsoft#24489) - (microsoft#24504) - (microsoft#24510) - (microsoft#24456) - (microsoft#24537) - (microsoft#24501) - (microsoft#24519) - (microsoft#24513) - (microsoft#24539) - (microsoft#24514) - (microsoft#24542) - (microsoft#24585) Not added: Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing cuda pipeline is ready - (microsoft#24491) - (microsoft#24509) - (microsoft#24564) --------- Co-authored-by: Adrian Lizarraga <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: Justin Chu <[email protected]> Co-authored-by: Prathik Rao <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Ankan Banerjee <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Gaurav Garg <[email protected]> Co-authored-by: iraut <[email protected]> Co-authored-by: Hrishikesh Manohar <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: kunal-vaishnavi <[email protected]> Co-authored-by: xhcao <[email protected]>

### Description Cherry pick the following into [rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0) - (microsoft#24491) - (microsoft#24509) - (microsoft#24564) - (microsoft#24574) - (microsoft#24582) - (microsoft#24584) - (microsoft#24568) - (microsoft#24587) - (microsoft#24563) - (microsoft#24592) - (microsoft#24526) - (microsoft#24552) - (microsoft#24588) - (microsoft#24605) - (microsoft#24606) --------- Co-authored-by: Jing Fang <[email protected]> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: Baiju Meswani <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Mark Schofield <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Ashwath Shankarnarayan <[email protected]> Co-authored-by: saurabh <[email protected]> Co-authored-by: Adrian Lizarraga <[email protected]> Co-authored-by: Hector Li <[email protected]>

### Description Cherry pick the following into [rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0) - (microsoft#24608) - (microsoft#24545) --------- Co-authored-by: Changming Sun <[email protected]> Co-authored-by: Maximilian Müller <[email protected]>

### Description  Add microsoft#24625 ### Motivation and Context  Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: George Wu <[email protected]>

…ft#24630) ### Description Adds microsoft#24629 to the ORT 1.22.0 release branch ### Motivation and Context

… (microsoft#24638) ### Description Adds support for selection policy delegate directly to the release branch. This is necessary to avoid having to update C# bindings (which are in main but not in the release branch) Based on microsoft#24635 ### Motivation and Context

Co-authored-by: Baiju Meswani <[email protected]>

…t#24651)" (microsoft#24668) This reverts commit 8fbc5d7 which results in packaging pipeline failures

### Description Update the folder name from win-arm64x to win-arm64 since it is invalid RID: https://learn.microsoft.com/en-us/dotnet/core/rid-catalog#windows-rids ### Description cherry-pick from microsoft#24690

Fix pipeline rename conflict to create NuGet release package --------- Co-authored-by: Alex Marin <[email protected]>

….22_release_tag_merge

TedThemistokleous · 2025-06-04T15:10:19Z

Running a build with this to ensure we don't hit any compile time errors right now on an internal system. Will flip to review when ready

TedThemistokleous · 2025-06-04T20:13:08Z

Seems to build. Hitting error on one of the tests. Will investigate before merge

TedThemistokleous · 2025-06-04T21:26:37Z

Related to a test that isn't stubbed out like it is for CUDA EP - Issue seems benign and on the CUDA side. Since we're hipifying things we can assume once CUDA test fixed this will resolve. I'll be upstreaming this patch to Onnxrt mainline as well

TedThemistokleous · 2025-06-05T03:55:51Z

There's an exclude that needs to go into the failing test (also excludes CUDA EP)

Confirmed this works with that change.

Need this since ROCm EP just hipifies the CUDA kernel's used for this. Will give false failures when in fact CUDA EP is doing the same thing

TedThemistokleous · 2025-06-05T04:00:40Z

Cherry-pick for this is going to mainline via - microsoft#24961

pravg-amd and others added 30 commits March 24, 2025 10:11

[CPU] Add fp16 support to sparse attention (microsoft#24015)

828e372

### Description Add fp16 support to sparse attention ### Motivation and Context Generalize models for CPU and GPU

refactor mac CI pipelines (microsoft#24138)

373b9e2

### Description This PR refactors the mac CI pipeline: - Use composite action and reusable workflow to put together duplicated code - separate each EP

Address Windows CUDA build issue (microsoft#24149)

5244d68

### Description Create a separate template overloads to address Windows Debug build warning 'unreachable code'.

Move Linux CPU CI pipeline to Github Actions (microsoft#24154)

8680667

### Description Move the x64 part of "Linux CPU CI pipeline" to Github Actions

[WebGPU EP] Add GEMM implementation (microsoft#24023)

a8673c6

Increases operator GEMM for WebGPU ep. --------- Co-authored-by: Xiaofei Han <[email protected]> Co-authored-by: Yulong Wang <[email protected]>

revise mac os pipeline to reduce the amount of jobs (microsoft#24177)

32b376c

### Description - remove x86_64/Debug build in the matrix to reduce the amount of jobs - set max-parallel to 1 to avoid big backlogs (single PR will take longer but less traffic in the pipeine)

fix triggering for "Validate Gradle Wrapper" pipeline (microsoft#24181)

be1cfc4

### Description currently it is triggered on every branch.

upgrade QNN to version 2.32.0.250228 (microsoft#23977)

5d805c2

### Description upgrade QNN to latest version 2.32.0.250228

[JSEP] adjust edge case logic for scatternd (microsoft#24172)

24ece47

Fixes microsoft#24070 by explicitly restricting single-threaded, sequential execution in the case where `reduction=none && hasDuplicates`.

Make the custom nuget packaging pipeline 1ES commpliant. (microsoft#2…

1f70fc2

…4191)

Disable KleidiAI in Python Packaging pipeline MacOS build (microsoft#…

4d13b70

…24194) This is a workaround for a build error. See microsoft#24152.

Rolling back the python/cuda (microsoft#24170)

041674a

### Description  ### Motivation and Context

skottmckay and others added 14 commits April 20, 2025 16:14

[WebNN] Support AveragePool with count_include_pad == 1 (microsoft#24465

9c6351f

) ### Description WebNN doesn't support AveragePool with count_include_pad == 1. ### Motivation and Context Support it by adding a pad and calling averagePool2D with pads as 0's.

Publish debug symbols for windows (microsoft#24643) (microsoft#24651)

8fbc5d7

Co-authored-by: Baiju Meswani <[email protected]>

Revert "Publish debug symbols for windows (microsoft#24643) (microsof…

6b0f7c9

…t#24651)" (microsoft#24668) This reverts commit 8fbc5d7 which results in packaging pipeline failures

Qnn nuget package update for arm64x (microsoft#24690) (microsoft#24694)

6c8097a

### Description Update the folder name from win-arm64x to win-arm64 since it is invalid RID: https://learn.microsoft.com/en-us/dotnet/core/rid-catalog#windows-rids ### Description cherry-pick from microsoft#24690

Cherry pick fix for NuGet DML Release package Issue (microsoft#24696)

f217402

Fix pipeline rename conflict to create NuGet release package --------- Co-authored-by: Alex Marin <[email protected]>

Merge commit 'f217402897f40ebba457e2421bc0a4702771968e' into onnxrt_1…

0fd2dc8

….22_release_tag_merge

TedThemistokleous requested review from ahsan-ca, causten and eddieliao June 4, 2025 15:09

TedThemistokleous self-assigned this Jun 4, 2025

TedThemistokleous added the Roadmap Item within release roadmap label Jun 4, 2025

Add ROCm execution provider to excluded EP for test with Cuda EP

b0d9525

Need this since ROCm EP just hipifies the CUDA kernel's used for this. Will give false failures when in fact CUDA EP is doing the same thing

streamhsa force-pushed the onnxrt_1.22_release_tag_merge branch from f2d84ce to b0d9525 Compare June 5, 2025 03:57

eddieliao approved these changes Jun 5, 2025

View reviewed changes

causten approved these changes Jun 5, 2025

View reviewed changes

ahsan-ca approved these changes Jun 5, 2025

View reviewed changes

TedThemistokleous merged commit 762fb30 into rocm7.0_internal_testing Jun 5, 2025
3 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Onnxrt 1.22 release tag merge #125

Onnxrt 1.22 release tag merge #125

Uh oh!

TedThemistokleous commented Jun 4, 2025 •

edited

Loading

Uh oh!

TedThemistokleous commented Jun 4, 2025 •

edited

Loading

Uh oh!

TedThemistokleous commented Jun 4, 2025

Uh oh!

TedThemistokleous commented Jun 4, 2025

Uh oh!

TedThemistokleous commented Jun 5, 2025

Uh oh!

TedThemistokleous commented Jun 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

62 participants

Onnxrt 1.22 release tag merge #125

Onnxrt 1.22 release tag merge #125

Uh oh!

Conversation

TedThemistokleous commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

TedThemistokleous commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TedThemistokleous commented Jun 4, 2025

Uh oh!

TedThemistokleous commented Jun 4, 2025

Uh oh!

TedThemistokleous commented Jun 5, 2025

Uh oh!

TedThemistokleous commented Jun 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

62 participants

TedThemistokleous commented Jun 4, 2025 •

edited

Loading

TedThemistokleous commented Jun 4, 2025 •

edited

Loading