-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Exclude MAUI projects from GPU C# packaging builds #23923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
skottmckay
merged 1 commit into
main
from
skottmckay/Exclude_MAUI_projects_from_GPU_Csharp_packaging_builds
Mar 7, 2025
Merged
Exclude MAUI projects from GPU C# packaging builds #23923
skottmckay
merged 1 commit into
main
from
skottmckay/Exclude_MAUI_projects_from_GPU_Csharp_packaging_builds
Mar 7, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… to include any MAUI support for those.
amarin16
approved these changes
Mar 6, 2025
ankitm3k
pushed a commit
to intel/onnxruntime
that referenced
this pull request
Mar 10, 2025
* Fix flash attention for GQA (Phi4) (microsoft#23850) ### Description This change fixes GQA for Flash Attention on Nvidia GPUs. The root cause appears to be `k_start + capped_sg_id < seq_causal_length` check. This is either because, a. seq_causal_length varies per lane, so the check becomes non uniform control flow, which is having interactions with subgroupShuffle. or b. The check itself is incorrect and is wiping out values of v based on the source lane's seq_causal_length. While in actualness values of v need to be causal as per the lane that is going to multiply it with qkt. qkt is already causal because earlier values of qk for out of bounds k are set to min_value, and exp(<-4) are 0. This fix works by removing that causal check and relying on the qk being wiped out earlier. The documentation for causality behavior for GQA is missing to determine which of this reason is the true reason. Prior to this prompts with sequence length > 16 < 32 or 1k would break with Phi 4 but smaller prompts would work. Tested on Intel Alderlake, Nvidia 4070. * Model Builder API (microsoft#23223) ### Description <!-- Describe your changes. --> Supports creating a model programmatically using the ORT C or C++ API. Supports augmenting an existing model to add nodes. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Fix typo: change `Upample` to `Upsample`. (microsoft#23838) ### Description <!-- Describe your changes. --> Fixed a typo in function names related to the Upsample CUDA kernel. Changed incorrect spelling Upample to Upsample across relevant functions. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This change is necessary to maintain consistency and prevent potential confusion caused by incorrect function names. * [doc] Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ (microsoft#23848) ### Description <!-- Describe your changes. --> Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Quant tool: Consistent `get_qdq_config` and `get_qnn_qdq_config` behavior (microsoft#23856) * Change the logic to generate the default ep context file name (microsoft#23788) Change the logic to generate the default ep context file name ### Description Applies to all EPs: replace the .onnx to _ctx.onnx, instead of directly append extra string _ctx.onnx to existing model path. In QNN EP, also make the context binary .bin file shorter by removing QNNExecutionProvider_ from the file name. * Make Nuget QNN package pipeline 1ES compliant (microsoft#23805) ### Description Make [QNN_Nuget_Windows](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1234)1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [js/common] allows using Uint16Array as data for float16 tensor (microsoft#23827) ### Description Resolve microsoft#23817 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [js/webgpu] Reland the optimization of ConvTranspose (microsoft#23858) This PR fixes the errors in the ConvTranspose optimization and adds tests to ensure the correctness of the implementation. * [OpenVINO] Fix a build warning (microsoft#23877) ### Description Fix a warning with std::move usage ### Motivation and Context Possibly allow building without --compile_no_warning_as_error flag * Change gsl::byte to std::byte (microsoft#23872) To be compatible with the latest GSL library. Without this fix we will get: ``` onnxruntime\core\providers\cpu\controlflow\loop.cc(247): error C4996: 'gsl::byte': Use std::byte instead. ``` * Allow using extended minimal build for several EPs (microsoft#23834) ### Description #### Background From code search, the following EPs use `onnxruntime::GetCpuPreferredNodes()` in their `GetCapabilities()` methods: - CANN - CUDA - DML - JS - ROCM - WebGPU However, the source file that implements `onnxruntime::GetCpuPreferredNodes()` is excluded when minimal build is ON: https://github.com/microsoft/onnxruntime/blob/6df0973e58ba5399fcaa98686f70ed9a9e59aaef/cmake/onnxruntime_framework.cmake#L38-L42 This means that all EPs mentioned above is not able to compile with minimal build. #### Solution The excluded file `core/framework/fallback_cpu_capability.cc` cannot build in minimal build because some of its dependencies are not included in the minimal build. However, in extended minimal build mode, all dependencies are available. This PR looses the restrict and allows to compile this file when it is extended minimal build. After this change, those EPs are able to compile in extended minimal build. * Add dawn to ThirdPartyNotices (microsoft#23876) ### Description Add `dawn` to ThirdPartyNotices. * Enable QNN EP weight sharing generation using public API (microsoft#23702) ### Description Enable QNN EP weight sharing generation using public API instead of internal interfaces, so that user can integrate into their own toolchain. The change is to share the QnnBackendManager across ORT sessions if ep.share_ep_contexts is enabled. And there is extra option to end the share so that we know when to remove the shared QnnBackendManager from the singleton. Change the tool name from onnxruntime_qnn_ctx_gen to ep_weight_sharing_ctx_gen, so that it can be shared for other EPs. * [QNN-EP]: Fix inference failures while running with htp_shared_memory (microsoft#23892) ### Description When using the enable_htp_shared_memory feature, we see that the address of the buffer passed to rpcmem_free is incorrect. So the rpc buffers are not freed leading to memory exhaustion. ### Motivation and Context When using the enable_htp_shared_memory_allocator feature for QNN in GenAI extensions, it leads to inference failures during the second prompt. As GenAI memory asks are higher, it surfaces sooner in gen AI use cases. Co-authored-by: Ashish Garg <[email protected]> * Fix enable_pix_capture build for WebGPU (microsoft#23857) The build option --enable_pix_capture is broken. This fixes the problem. --------- Co-authored-by: wp <[email protected]> * [WebGPU-EP Native] Add ReduceMean (microsoft#23860) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [WebGPU EP] introduce BiasAdd contrib op (microsoft#23861) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Dynamo export and improve benchmark script for SAM2 encoder (microsoft#23887) ### Description * Add dynamo export for Sam2 image encoder * Verify fp32 onnx model with CPU EP (to avoid error message from TRT EP). * Update benchmark script: - output ORT profiling - output torch compiled code and unique kernel name for compiled kernel - add an option for nightly package installation - uninstall existing ort packages before installing The node metadata of dynamo exported model can help mapping node in onnx model back to pytorch modeling script. Currently, the graph optimization is not done on dynamo exported model, so it is experimental right now. ### Motivation and Context To support profiling of torch compiled CUDA kernel. * [js/web] improve workaround for bundlers (microsoft#23902) ### Description This PR improves the workaround for bundlers in onnxruntime-web. Specifically, the following changes have been made: - Use [this workaround](xenova@9c50aa2) as suggested by @xenova in huggingface/transformers.js#1161 (comment) - Use `url > "file:" && url < "file;"` instead of `url.startsWith("file:")` to allow minifiers to remove dead code correctly. This change allows to remove unnecessary dependencies of file parsed from `new URL("ort.bundle.min.js", import.meta.url)` in Vite, and optimize code like `if("file://filepath.js".startsWith("file:")) {do_sth1(); } else {do_sth2();}` into `do_sth1()` for webpack/terser usages. Resolves huggingface/transformers.js#1161 * [webgpu] Restore MatMulNBits workgroup size for Phi-3.5 (microsoft#23349) ### Description This change restores the MatMulNBits workgroup size from (8, 8, 1) back to (16, 8, 1) to resolve a performance regression observed on Intel iGPUs during token generation (M=1). ### Motivation and Context As above. Signed-off-by: Jianhui Dai <[email protected]> * [webgpu] support Pad operator (microsoft#23141) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [WebNN] Accept Float16Array for float16 data type if it is available (microsoft#23894) Float16Array is now shipping and WebNN Chromium implementation has accepted it. We should allow it in WebNN EP as well. * Ensure that the 'cmake_minimum_required' is version 3.5 or greater (microsoft#23888) ### Description CMake 4.0 release candidate 2.0 is available, and it cannot compile all of OnnxRuntime out-of-the-box. There's portions of the OnnxRuntime codebase that specify a `cmake_minimum_required` version of 3.0, and CMake 4.0 has removed support for compatibility with CMake < 3.5 - the following error is reported: ``` CMake Error at winml_sdk_helpers.cmake:4 (cmake_minimum_required): Compatibility with CMake < 3.5 has been removed from CMake. Update the VERSION argument <min> value. Or, use the <min>...<max> syntax to tell CMake that the project requires at least <min> but has been updated to work with policies introduced by <max> or earlier. Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway. ``` Since CMake 3.5 appears to have shipped in 2016, it seems reasonable to set that as a minimum version to fix the error. The root CMakeLists.txt does ask for a minimum version of 3.28, so we could snap to that, but I'm still ramping up on the build, so wanted to propose a minimally sufficient fix. ### Motivation and Context Being able to build with the latest CMake - when it ships - reduces the barrier to entry to building OnnxRuntime, and allows the OnnxRuntime to leverage the latest and greatest tooling. * WebGPU: Remove deprecated subgroups-f16 from WebGPU native and JS EP (microsoft#23898) This PR removes the deprecated subgroups-f16 from WebGPU native and JS EP, and also remove the unused deviceInfo in WebGPU JS EP. * [JSEP/WebGPU] Fixed error in softmax dispatch. (microsoft#23906) ### Description Fixed an error softmax dispatch ### Motivation and Context Produce expected results for LlaMA model * enable WebGPU EP in WebAssembly build (microsoft#23913) ### Description This PR is the first step for migrating the webgpu backend of onnxruntime-web from JSEP based to WebGPU EP based. In this change, we enable building WebGPU EP in a wasm build (ie. `--build_wasm` `--use_webgpu` `--use_jsep`). However, the old build flags should still keep previous behavior. * Adding OpenVINO Windows CI Pipeline (microsoft#23919) ### Description <!-- Describe your changes. --> Enable an OpenVINO Windows CI pipeline. This includes: - Downloading the OpenVINO toolkit for Windows from an external source. - Setting up OpenVINO environment variables. - Building the ONNX Runtime OpenVINO Execution Provider. - Running unit tests. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This change is required to run checks on precommit and commit in the ONNX Runtime project. It ensures that the code is tested with the OpenVINO toolkit on Windows, improving the reliability and compatibility of the project. * [WebGPU EP] SoftMax Implementation (microsoft#23538) Increase coverage for WebGPU Op * Exclude MAUI projects from GPU C# packaging builds (microsoft#23923) ### Description <!-- Describe your changes. --> Use 'desktop only' solution in GPU C# packaging builds. We don't need to include any MAUI support for those builds. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Support all block sizes that are multiples of 32 for DP4A (microsoft#23907) ### Description Simple change 1. The DP4A shader actually supports all block sizes that are multiples of 32, relaxing the restriction and making a small tweak to support sizes other than 32. 2. Moved the shader to a separate file for maintainability. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Example custom op with output type inferencing (microsoft#23916) ### Description <!-- Describe your changes. --> Add example of a custom op that is required to do type inference for the output type for the model load to work. Also acts as an example of how to override an ONNX op with a custom implementation. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> microsoft#23891 * Enabling L2+ Optimizations for EPs (microsoft#23517) There are some requirements to modify the graph which are specific to the EP/hardware. ORT has the hardcoded EP list for optimizations but that can't scale and it's hard be extended to enable EP custom optimizations. Here is the prototype to enable L2+ optimizations for EPs (The original overview is provided by @skottmckay) as well as the TRT EP implementation for the ConstantFoldingDQ optimization. Signatures for selection and optimization functions: ```` - Selection: std::function<std::vector<std::unique_ptr<ComputeCapability>>(const GraphViewer&, const KeyValueConfig&)> - Optimization: std::function<Status(const Graph&, const ComputeCapability& this_optimization, ComputeCapability& cc_to_update)> ```` GetCapability - call (new) provider bridge API to lookup pre-defined optimizer by name and get selection function - ComputeCapability.optimize_func, i.e. optimization function, would be set by the optimizer to the function that does the optimization - EP has to update the returning ComputeCapability to include the optimization ComputeCapability in nodes_to_optimize. So that later ORT can perform optimization/transformation accordingly. GraphPartitioner - After assigning the ComputeCapability to the EP and prior to Compile, if the ComputeCapability has nodes_to_optimize, iterate that list - optimization function needs to be called with - a mutable Graph instance - the ComputeCapability for the individual optimization - the overall ComputeCapability so it can be updated * fix binplace file in web pipeline (microsoft#23930) * Updated run_CIs_for_external_pr.py to support the Windows OpenVINO CI pipeline (microsoft#23931) * Fix ConvInteger handling of optional inputs. (microsoft#23935) ### Description <!-- Describe your changes. --> Fix ConvInteger handling of optional inputs. Need to check Exists() and not just the number of inputs. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> microsoft#23927 * Updated ov version in pipeline (#595) (microsoft#23882) ### Description This PR updates the OpenVINO version used in the pipeline from 2024.5.0 to 2025.0.0 Co-authored-by: jatinwadhwa921 <[email protected]> * [AIX] External data handling (microsoft#23859) ### Description In BE system, model tensor data coming from external file is not handled properly. This was found during the debugging of (microsoft/onnxruntime-genai#1104) This PR changes do the endianness conversion of data loaded from external file in BE system. * Create a packaging pipeline for a custom nuget package (microsoft#23918) * Fix license in example test code. (microsoft#23936) * replace usage of gsl::narrow and gsl::narrow_cast in WebGPU EP (microsoft#23926) ### Description `gsl::narrow` does not work in no exception build. - use `onnxruntime::narrow` if necessary; - or change to `static_cast` if it's obviously safe. also apply the changes to usage of `gsl::narrow_cast`, which does not apply checks. * VCPKG improvement: set VCPKG_OSX_DEPLOYMENT_TARGET (microsoft#23933) ### Description 1. Set VCPKG_OSX_DEPLOYMENT_TARGET for macOS targets 2. Enable VCPKG in more pipelines. * Allow using a different version of flatbuffers when building with vcpkg (microsoft#23946) ### Description Allow using a different version of flatbuffers when building with vcpkg, so that users do not need to pin flatbuffer's version, which provides more flexibility in the build process. Delete utf8_range from the dependencies, because it is an indirect dependency of protobuf, which is already included in the build process. ### Motivation and Context * Make python package pipeline 1ES compliant (microsoft#23800) ### Description Make [Python packaging pipeline](https://aiinfra.visualstudio.com/530acbc4-21bc-487d-8cd8-348ff451d2ff/_build?definitionId=841) 1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Checklist - [x] Make Onnxruntime-QNNEP-Windows-2022-CPU stateless * Delete ROCM Nuget Publishing Pipeline (microsoft#23948) * Bump SixLabors.ImageSharp from 2.1.9 to 2.1.10 in /csharp/sample/Microsoft.ML.OnnxRuntime.FasterRcnnSample (microsoft#23924) Bumps [SixLabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.9 to 2.1.10. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/SixLabors/ImageSharp/releases">SixLabors.ImageSharp's releases</a>.</em></p> <blockquote> <h2>v2.1.10</h2> <h2>What's Changed</h2> <ul> <li>Backport <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2859">#2859</a> to release/2.1.x by <a href="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/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2890">SixLabors/ImageSharp#2890</a></li> <li>Backport <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2701">#2701</a> to 2.1.x [copy] by <a href="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/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2891">SixLabors/ImageSharp#2891</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/SixLabors/ImageSharp/commit/d133ef99e8becfc3b924b0bb4315e63b8681d307"><code>d133ef9</code></a> Set lang version</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/5dfe5a800367581239de442cc18de659da6e9b1d"><code>5dfe5a8</code></a> Missed cache action update</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/4d3a85112b03c89d2cb8616a5b747684b6e73730"><code>4d3a851</code></a> Use latest cache action</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/4cb9f40a722ab2b837157862f0320c6a652da4d0"><code>4cb9f40</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2891">#2891</a> from SixLabors/af/backport-2701</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/bb82f79db0197166271d4355b5fb5ceda370a906"><code>bb82f79</code></a> <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2701">#2701</a> to 2.1.x [copy]</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/627b5f721f30f6d529acb50bd81f92bd3db754eb"><code>627b5f7</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2890">#2890</a> from SixLabors/af/backport-2859</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/67f7848d6e975e7956c8056823555de49a5fdf6d"><code>67f7848</code></a> try to fix LFS for *.BMP</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/44d294e06606111195152ead3006452357ef1bb9"><code>44d294e</code></a> 8.0.x is not needed</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/adb85d9e66aa3a588a86f4a4ef9a0539a8502117"><code>adb85d9</code></a> Another attempt for a Linux-specific skip</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/efc3fc4ee15eec4e523c26f7130e786541b00df2"><code>efc3fc4</code></a> Disable BmpDecoder_CanDecode_Os2BitmapArray on Linux</li> <li>Additional commits viewable in <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: Jianhui Dai <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Sushanth Rajasankar <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Seungtaek Kim <[email protected]> Co-authored-by: co63oc <[email protected]> Co-authored-by: Jambay Kinley <[email protected]> Co-authored-by: Hector Li <[email protected]> Co-authored-by: Jian Chen <[email protected]> Co-authored-by: Yulong Wang <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: Alessio Soldano <[email protected]> Co-authored-by: Changming Sun <[email protected]> Co-authored-by: Ashish Garg <[email protected]> Co-authored-by: Ashish Garg <[email protected]> Co-authored-by: Jie Chen <[email protected]> Co-authored-by: wp <[email protected]> Co-authored-by: Satya Kumar Jandhyala <[email protected]> Co-authored-by: Prathik Rao <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: Jianhui Dai <[email protected]> Co-authored-by: xhcao <[email protected]> Co-authored-by: Wanming Lin <[email protected]> Co-authored-by: Mark Schofield <[email protected]> Co-authored-by: jiangzhaoming <[email protected]> Co-authored-by: Yi-Hong Lyu <[email protected]> Co-authored-by: vraspar <[email protected]> Co-authored-by: Chi Lo <[email protected]> Co-authored-by: saurabh <[email protected]> Co-authored-by: Ranjit Ranjan <[email protected]> Co-authored-by: Baiju Meswani <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2 tasks
jchen351
added a commit
that referenced
this pull request
Mar 25, 2025
…disabled in MacOS and iOS packaging stage due to (#24152) (#24153) NuGet_Packaging_CPU is broken due to similar issue from #23923 ### Description Migrate [Zip-Nuget Package Pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=940&_a=summary) to 1ES ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Check list - [x] Issue with onnxruntime-Win-CPU-2022 - [x] [Spot Bug](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=697830&view=logs&j=6c6a898f-bbbb-5c72-8695-82b606149fa2&t=433f102b-5ed3-5fed-87a0-6107744ce9b1&l=81)
jatinwadhwa921
added a commit
to intel/onnxruntime
that referenced
this pull request
Apr 10, 2025
* Quant tool: Add `nodes_to_exclude` in `get_qnn_qdq_config` (#23779) * [ORT/CI_Pipeline] Use --enable_generic_interface in ORT builds for EP testing (#23801) Summary of changes: - Changed openVINO test case to use --enable_generic_interface - changed tensorRT test case to use --enable_generic_interface - Fixed ORT builds to USE_FULL_PROTOBUF as openVINO/TensorRT requires them - Fixed pre-processor macro definition which accidently got removed when ORT is build w/o EP ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Karim Vadsariya <[email protected]> * Increase npm package pipeline ReactNative_CI_iOS timeout to 120 mins (#23825) ### Description Increase [npm package pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1080&_a=summary) ReactNative_CI_iOS timeout to 120 mins ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [Mlas] Unblock hardcoded matmul blocking size (#23815) ### Description In GemmBatch, target matrix is cut into blocks to dispatch to multiple threads for intra-op parallelism. Currently the block size hard-coded to 16. If the CPU has > 16 cores, cores are not fully utilized in one op. This change unblocks the number of blocks in various MatMul. __Benchmark results__ Model: llmlingua-2-bert-base-multilingual-cased-meetingbank--add-force-token-100--max-seq-len-512-CPU-INT8.onnx set up: 96 core x86 linux Before: Setting intra_op_num_threads to 64 Overriding dimension with name, batch_size, to 3 Session creation time cost: 0.485097 s First inference time cost: 356 ms Total inference time cost: 17.731 s Total inference requests: 50 __Average inference time cost: 354.619 ms__ Total inference run time: 17.7312 s Number of inferences per second: 2.81989 Avg CPU usage: 65 % Peak working set size: 542265344 bytes Avg CPU usage:65 Peak working set size:542265344 After: Setting intra_op_num_threads to 32 Overriding dimension with name, batch_size, to 3 Session creation time cost: 0.523394 s First inference time cost: 316 ms Total inference time cost: 12.2739 s Total inference requests: 50 __Average inference time cost: 245.478 ms__ Total inference run time: 12.2741 s Number of inferences per second: 4.07362 Avg CPU usage: 33 % Peak working set size: 611241984 bytes Avg CPU usage:33 Peak working set size:611241984 Setting intra_op_num_threads to 64 Overriding dimension with name, batch_size, to 3 Session creation time cost: 0.497698 s First inference time cost: 289 ms Total inference time cost: 9.49205 s Total inference requests: 50 __Average inference time cost: 189.841 ms__ Total inference run time: 9.49226 s Number of inferences per second: 5.26745 Avg CPU usage: 65 % Peak working set size: 548470784 bytes Avg CPU usage:65 Peak working set size:548470784 Runs:50 ### Motivation and Context This issue is reported by M365 research team. * Revert changes onn mac-react-native-ci-pipeline.yml (#23845) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Fix flash attention for GQA (Phi4) (#23850) ### Description This change fixes GQA for Flash Attention on Nvidia GPUs. The root cause appears to be `k_start + capped_sg_id < seq_causal_length` check. This is either because, a. seq_causal_length varies per lane, so the check becomes non uniform control flow, which is having interactions with subgroupShuffle. or b. The check itself is incorrect and is wiping out values of v based on the source lane's seq_causal_length. While in actualness values of v need to be causal as per the lane that is going to multiply it with qkt. qkt is already causal because earlier values of qk for out of bounds k are set to min_value, and exp(<-4) are 0. This fix works by removing that causal check and relying on the qk being wiped out earlier. The documentation for causality behavior for GQA is missing to determine which of this reason is the true reason. Prior to this prompts with sequence length > 16 < 32 or 1k would break with Phi 4 but smaller prompts would work. Tested on Intel Alderlake, Nvidia 4070. * Model Builder API (#23223) ### Description <!-- Describe your changes. --> Supports creating a model programmatically using the ORT C or C++ API. Supports augmenting an existing model to add nodes. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Fix typo: change `Upample` to `Upsample`. (#23838) ### Description <!-- Describe your changes. --> Fixed a typo in function names related to the Upsample CUDA kernel. Changed incorrect spelling Upample to Upsample across relevant functions. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This change is necessary to maintain consistency and prevent potential confusion caused by incorrect function names. * [doc] Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ (#23848) ### Description <!-- Describe your changes. --> Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Quant tool: Consistent `get_qdq_config` and `get_qnn_qdq_config` behavior (#23856) * Change the logic to generate the default ep context file name (#23788) Change the logic to generate the default ep context file name ### Description Applies to all EPs: replace the .onnx to _ctx.onnx, instead of directly append extra string _ctx.onnx to existing model path. In QNN EP, also make the context binary .bin file shorter by removing QNNExecutionProvider_ from the file name. * Make Nuget QNN package pipeline 1ES compliant (#23805) ### Description Make [QNN_Nuget_Windows](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1234)1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [js/common] allows using Uint16Array as data for float16 tensor (#23827) ### Description Resolve #23817 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [js/webgpu] Reland the optimization of ConvTranspose (#23858) This PR fixes the errors in the ConvTranspose optimization and adds tests to ensure the correctness of the implementation. * [OpenVINO] Fix a build warning (#23877) ### Description Fix a warning with std::move usage ### Motivation and Context Possibly allow building without --compile_no_warning_as_error flag * Change gsl::byte to std::byte (#23872) To be compatible with the latest GSL library. Without this fix we will get: ``` onnxruntime\core\providers\cpu\controlflow\loop.cc(247): error C4996: 'gsl::byte': Use std::byte instead. ``` * Allow using extended minimal build for several EPs (#23834) ### Description #### Background From code search, the following EPs use `onnxruntime::GetCpuPreferredNodes()` in their `GetCapabilities()` methods: - CANN - CUDA - DML - JS - ROCM - WebGPU However, the source file that implements `onnxruntime::GetCpuPreferredNodes()` is excluded when minimal build is ON: https://github.com/microsoft/onnxruntime/blob/6df0973e58ba5399fcaa98686f70ed9a9e59aaef/cmake/onnxruntime_framework.cmake#L38-L42 This means that all EPs mentioned above is not able to compile with minimal build. #### Solution The excluded file `core/framework/fallback_cpu_capability.cc` cannot build in minimal build because some of its dependencies are not included in the minimal build. However, in extended minimal build mode, all dependencies are available. This PR looses the restrict and allows to compile this file when it is extended minimal build. After this change, those EPs are able to compile in extended minimal build. * Add dawn to ThirdPartyNotices (#23876) ### Description Add `dawn` to ThirdPartyNotices. * Enable QNN EP weight sharing generation using public API (#23702) ### Description Enable QNN EP weight sharing generation using public API instead of internal interfaces, so that user can integrate into their own toolchain. The change is to share the QnnBackendManager across ORT sessions if ep.share_ep_contexts is enabled. And there is extra option to end the share so that we know when to remove the shared QnnBackendManager from the singleton. Change the tool name from onnxruntime_qnn_ctx_gen to ep_weight_sharing_ctx_gen, so that it can be shared for other EPs. * [QNN-EP]: Fix inference failures while running with htp_shared_memory (#23892) ### Description When using the enable_htp_shared_memory feature, we see that the address of the buffer passed to rpcmem_free is incorrect. So the rpc buffers are not freed leading to memory exhaustion. ### Motivation and Context When using the enable_htp_shared_memory_allocator feature for QNN in GenAI extensions, it leads to inference failures during the second prompt. As GenAI memory asks are higher, it surfaces sooner in gen AI use cases. Co-authored-by: Ashish Garg <[email protected]> * Fix enable_pix_capture build for WebGPU (#23857) The build option --enable_pix_capture is broken. This fixes the problem. --------- Co-authored-by: wp <[email protected]> * [WebGPU-EP Native] Add ReduceMean (#23860) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [WebGPU EP] introduce BiasAdd contrib op (#23861) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Dynamo export and improve benchmark script for SAM2 encoder (#23887) ### Description * Add dynamo export for Sam2 image encoder * Verify fp32 onnx model with CPU EP (to avoid error message from TRT EP). * Update benchmark script: - output ORT profiling - output torch compiled code and unique kernel name for compiled kernel - add an option for nightly package installation - uninstall existing ort packages before installing The node metadata of dynamo exported model can help mapping node in onnx model back to pytorch modeling script. Currently, the graph optimization is not done on dynamo exported model, so it is experimental right now. ### Motivation and Context To support profiling of torch compiled CUDA kernel. * [js/web] improve workaround for bundlers (#23902) ### Description This PR improves the workaround for bundlers in onnxruntime-web. Specifically, the following changes have been made: - Use [this workaround](https://github.com/xenova/onnxruntime/commit/9c50aa2c63bad4cb73ad77ff1c43e0c43da0907f) as suggested by @xenova in https://github.com/huggingface/transformers.js/pull/1161#issuecomment-2695785730 - Use `url > "file:" && url < "file;"` instead of `url.startsWith("file:")` to allow minifiers to remove dead code correctly. This change allows to remove unnecessary dependencies of file parsed from `new URL("ort.bundle.min.js", import.meta.url)` in Vite, and optimize code like `if("file://filepath.js".startsWith("file:")) {do_sth1(); } else {do_sth2();}` into `do_sth1()` for webpack/terser usages. Resolves https://github.com/huggingface/transformers.js/pull/1161 * [webgpu] Restore MatMulNBits workgroup size for Phi-3.5 (#23349) ### Description This change restores the MatMulNBits workgroup size from (8, 8, 1) back to (16, 8, 1) to resolve a performance regression observed on Intel iGPUs during token generation (M=1). ### Motivation and Context As above. Signed-off-by: Jianhui Dai <[email protected]> * [webgpu] support Pad operator (#23141) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [WebNN] Accept Float16Array for float16 data type if it is available (#23894) Float16Array is now shipping and WebNN Chromium implementation has accepted it. We should allow it in WebNN EP as well. * Ensure that the 'cmake_minimum_required' is version 3.5 or greater (#23888) ### Description CMake 4.0 release candidate 2.0 is available, and it cannot compile all of OnnxRuntime out-of-the-box. There's portions of the OnnxRuntime codebase that specify a `cmake_minimum_required` version of 3.0, and CMake 4.0 has removed support for compatibility with CMake < 3.5 - the following error is reported: ``` CMake Error at winml_sdk_helpers.cmake:4 (cmake_minimum_required): Compatibility with CMake < 3.5 has been removed from CMake. Update the VERSION argument <min> value. Or, use the <min>...<max> syntax to tell CMake that the project requires at least <min> but has been updated to work with policies introduced by <max> or earlier. Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway. ``` Since CMake 3.5 appears to have shipped in 2016, it seems reasonable to set that as a minimum version to fix the error. The root CMakeLists.txt does ask for a minimum version of 3.28, so we could snap to that, but I'm still ramping up on the build, so wanted to propose a minimally sufficient fix. ### Motivation and Context Being able to build with the latest CMake - when it ships - reduces the barrier to entry to building OnnxRuntime, and allows the OnnxRuntime to leverage the latest and greatest tooling. * WebGPU: Remove deprecated subgroups-f16 from WebGPU native and JS EP (#23898) This PR removes the deprecated subgroups-f16 from WebGPU native and JS EP, and also remove the unused deviceInfo in WebGPU JS EP. * [JSEP/WebGPU] Fixed error in softmax dispatch. (#23906) ### Description Fixed an error softmax dispatch ### Motivation and Context Produce expected results for LlaMA model * enable WebGPU EP in WebAssembly build (#23913) ### Description This PR is the first step for migrating the webgpu backend of onnxruntime-web from JSEP based to WebGPU EP based. In this change, we enable building WebGPU EP in a wasm build (ie. `--build_wasm` `--use_webgpu` `--use_jsep`). However, the old build flags should still keep previous behavior. * Adding OpenVINO Windows CI Pipeline (#23919) ### Description <!-- Describe your changes. --> Enable an OpenVINO Windows CI pipeline. This includes: - Downloading the OpenVINO toolkit for Windows from an external source. - Setting up OpenVINO environment variables. - Building the ONNX Runtime OpenVINO Execution Provider. - Running unit tests. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This change is required to run checks on precommit and commit in the ONNX Runtime project. It ensures that the code is tested with the OpenVINO toolkit on Windows, improving the reliability and compatibility of the project. * [WebGPU EP] SoftMax Implementation (#23538) Increase coverage for WebGPU Op * Exclude MAUI projects from GPU C# packaging builds (#23923) ### Description <!-- Describe your changes. --> Use 'desktop only' solution in GPU C# packaging builds. We don't need to include any MAUI support for those builds. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Support all block sizes that are multiples of 32 for DP4A (#23907) ### Description Simple change 1. The DP4A shader actually supports all block sizes that are multiples of 32, relaxing the restriction and making a small tweak to support sizes other than 32. 2. Moved the shader to a separate file for maintainability. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Example custom op with output type inferencing (#23916) ### Description <!-- Describe your changes. --> Add example of a custom op that is required to do type inference for the output type for the model load to work. Also acts as an example of how to override an ONNX op with a custom implementation. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #23891 * Enabling L2+ Optimizations for EPs (#23517) There are some requirements to modify the graph which are specific to the EP/hardware. ORT has the hardcoded EP list for optimizations but that can't scale and it's hard be extended to enable EP custom optimizations. Here is the prototype to enable L2+ optimizations for EPs (The original overview is provided by @skottmckay) as well as the TRT EP implementation for the ConstantFoldingDQ optimization. Signatures for selection and optimization functions: ```` - Selection: std::function<std::vector<std::unique_ptr<ComputeCapability>>(const GraphViewer&, const KeyValueConfig&)> - Optimization: std::function<Status(const Graph&, const ComputeCapability& this_optimization, ComputeCapability& cc_to_update)> ```` GetCapability - call (new) provider bridge API to lookup pre-defined optimizer by name and get selection function - ComputeCapability.optimize_func, i.e. optimization function, would be set by the optimizer to the function that does the optimization - EP has to update the returning ComputeCapability to include the optimization ComputeCapability in nodes_to_optimize. So that later ORT can perform optimization/transformation accordingly. GraphPartitioner - After assigning the ComputeCapability to the EP and prior to Compile, if the ComputeCapability has nodes_to_optimize, iterate that list - optimization function needs to be called with - a mutable Graph instance - the ComputeCapability for the individual optimization - the overall ComputeCapability so it can be updated * fix binplace file in web pipeline (#23930) * Updated run_CIs_for_external_pr.py to support the Windows OpenVINO CI pipeline (#23931) * Fix ConvInteger handling of optional inputs. (#23935) ### Description <!-- Describe your changes. --> Fix ConvInteger handling of optional inputs. Need to check Exists() and not just the number of inputs. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #23927 * Updated ov version in pipeline (#595) (#23882) ### Description This PR updates the OpenVINO version used in the pipeline from 2024.5.0 to 2025.0.0 Co-authored-by: jatinwadhwa921 <[email protected]> * [AIX] External data handling (#23859) ### Description In BE system, model tensor data coming from external file is not handled properly. This was found during the debugging of (https://github.com/microsoft/onnxruntime-genai/issues/1104)(url) This PR changes do the endianness conversion of data loaded from external file in BE system. * Create a packaging pipeline for a custom nuget package (#23918) * Fix license in example test code. (#23936) * replace usage of gsl::narrow and gsl::narrow_cast in WebGPU EP (#23926) ### Description `gsl::narrow` does not work in no exception build. - use `onnxruntime::narrow` if necessary; - or change to `static_cast` if it's obviously safe. also apply the changes to usage of `gsl::narrow_cast`, which does not apply checks. * VCPKG improvement: set VCPKG_OSX_DEPLOYMENT_TARGET (#23933) ### Description 1. Set VCPKG_OSX_DEPLOYMENT_TARGET for macOS targets 2. Enable VCPKG in more pipelines. * Allow using a different version of flatbuffers when building with vcpkg (#23946) ### Description Allow using a different version of flatbuffers when building with vcpkg, so that users do not need to pin flatbuffer's version, which provides more flexibility in the build process. Delete utf8_range from the dependencies, because it is an indirect dependency of protobuf, which is already included in the build process. ### Motivation and Context * Make python package pipeline 1ES compliant (#23800) ### Description Make [Python packaging pipeline](https://aiinfra.visualstudio.com/530acbc4-21bc-487d-8cd8-348ff451d2ff/_build?definitionId=841) 1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Checklist - [x] Make Onnxruntime-QNNEP-Windows-2022-CPU stateless * Delete ROCM Nuget Publishing Pipeline (#23948) * Bump SixLabors.ImageSharp from 2.1.9 to 2.1.10 in /csharp/sample/Microsoft.ML.OnnxRuntime.FasterRcnnSample (#23924) Bumps [SixLabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.9 to 2.1.10. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/SixLabors/ImageSharp/releases">SixLabors.ImageSharp's releases</a>.</em></p> <blockquote> <h2>v2.1.10</h2> <h2>What's Changed</h2> <ul> <li>Backport <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2859">#2859</a> to release/2.1.x by <a href="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/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2890">SixLabors/ImageSharp#2890</a></li> <li>Backport <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2701">#2701</a> to 2.1.x [copy] by <a href="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/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2891">SixLabors/ImageSharp#2891</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/SixLabors/ImageSharp/commit/d133ef99e8becfc3b924b0bb4315e63b8681d307"><code>d133ef9</code></a> Set lang version</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/5dfe5a800367581239de442cc18de659da6e9b1d"><code>5dfe5a8</code></a> Missed cache action update</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/4d3a85112b03c89d2cb8616a5b747684b6e73730"><code>4d3a851</code></a> Use latest cache action</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/4cb9f40a722ab2b837157862f0320c6a652da4d0"><code>4cb9f40</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2891">#2891</a> from SixLabors/af/backport-2701</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/bb82f79db0197166271d4355b5fb5ceda370a906"><code>bb82f79</code></a> <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2701">#2701</a> to 2.1.x [copy]</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/627b5f721f30f6d529acb50bd81f92bd3db754eb"><code>627b5f7</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2890">#2890</a> from SixLabors/af/backport-2859</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/67f7848d6e975e7956c8056823555de49a5fdf6d"><code>67f7848</code></a> try to fix LFS for *.BMP</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/44d294e06606111195152ead3006452357ef1bb9"><code>44d294e</code></a> 8.0.x is not needed</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/adb85d9e66aa3a588a86f4a4ef9a0539a8502117"><code>adb85d9</code></a> Another attempt for a Linux-specific skip</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/efc3fc4ee15eec4e523c26f7130e786541b00df2"><code>efc3fc4</code></a> Disable BmpDecoder_CanDecode_Os2BitmapArray on Linux</li> <li>Additional commits viewable in <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Make python CUDA package pipeline 1ES compliant (#23802) ### Description Make [Python-Cuda-Publishing-Pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1311&_a=summary) 1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Migrate yarn to npm (#22116) ### Description This PR change all reference to yarn to npm ### Motivation and Context This PR is needed to address all Component Governce issue that ORT is facing ### Current issue - [x] use_react_native!(:path => config["reactNativePath"]) return nil - [x] For error `CocoaPods could not find compatible versions for pod "RCTRequired"`, we might need to increase iOS targe version from 13.0 to a higher version. - [x] For 'react-native' >= 0.73.x , react-native/react.gradle file is no longer used - [x] We need to update to gradle 7.6 or above to upgrade the RN. current gradlew version 7.3.3 that we use does not works on RN 71+. - [x] Instruction on how to implement the React-Native has changed since [0.72](https://reactnative.dev/docs/integration-with-existing-apps). - [x] Error `The new Java toolchain feature cannot be used at the project level in combination with source and/or target compatibility` from gradle. - [x] duplicate class: com.facebook.react.PackageList solution: remove `apply from: file("../../node_modules/@react-native-community/cli-platform-android/native_modules.gradle"); applyNativeModulesAppBuildGradle(project)` from bottom of andoird/app/build.gradle - [x] Need to update the OnnxruntimeModuleTest because `ReactApplicationContext` is now a abstract class. --------- Co-authored-by: Edward Chen <[email protected]> * [WebGPU/JSEP] Support group query attention do_rotary attribute (#23524) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Fix npm audit in js/react-native/e2e (#23975) * Suppress some warnings in WebGPU EP generated by GCC 13 (#23984) ### Description Replace #23445, resolve conflicts and add one new file. --------- Co-authored-by: Changming Sun <[email protected]> * Fix NPM audit in js/react-native (#23974) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Bump axios from 1.7.9 to 1.8.2 in /js/node (#23963) * GCC 14: fix insert_or_assign() call (#23955) Resolve #23954 * ADD emsdk env vars to VCPKG_KEEP_ENV_VARS (#23997) ### Description The vars are set by cmake\external\emsdk\emsdk_env.bat ### Motivation and Context By default they are filtered by vcpkg to make build reproducible. However, emscripten's cmake toolchain file needs this information. emcc.bat has the following code: ``` @set EM_PY=%EMSDK_PYTHON% @if "%EM_PY%"=="" ( set EM_PY=python ) ``` Actually, it doesn't work as expected. the line ``` set EM_PY=python ``` should be changed to ``` set EM_PY=python.exe ``` We haven't hit this issue because usually the var EM_PY is set. * Fix ONNX Runtime Python Test Pipeline (#23990) ### Description [Fix ONNX Runtime Python Test Pipeline ](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1164&_a=summary) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [webgpu] Fix the continuation issue (#23999) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [WebGPU EP] Implements Gelu, BiasSplitGelu, and QuickGelu (#23981) Increases WebGPU operator coverage * [Native WebGPU] Added ReduceMax and ReduceSum (#23934) ### Description Added ReduceMax and ReduceSum ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Convert Windows CPU CI Pipeline to Github Actions (#23996) * [Fix] Dependencies find_package Eigen error (#23939) ### Description To fix the CMake configuration error when a dependency brought in via FetchContent uses find_package(Eigen3 REQUIRED) Major Changes: - enable EIGEN_BUILD_CMAKE_PACKAGE - [optional] rename eigen to Eigen3 ### Motivation and Context Get the following build error when Dependencies use find_package(Eigen3 REQUIRED) ``` By not providing "FindEigen3.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "Eigen3", but CMake did not find one. Could not find a package configuration file provided by "Eigen3" with any of the following names: Eigen3Config.cmake eigen3-config.cmake Add the installation prefix of "Eigen3" to CMAKE_PREFIX_PATH or set "Eigen3_DIR" to a directory containing one of the above files. If "Eigen3" provides a separate development package or SDK, be sure it has been installed. ``` Eigen need enable **EIGEN_BUILD_CMAKE_PACKAGE** when FetchContent for generate **Eigen3Config.cmake** https://gitlab.com/libeigen/eigen/-/blob/master/CMakeLists.txt?ref_type=heads#L213 in addition , the eigen‘s project name is "Eigen3" and providing the cmake configuration file is "Eigen3Config.cmake" : https://gitlab.com/libeigen/eigen/-/blob/master/CMakeLists.txt?ref_type=heads#L36 https://gitlab.com/libeigen/eigen/-/blob/master/CMakeLists.txt?ref_type=heads#L252 So I think it's best for FetchContent_Declare Name to be consistent with the project name to avoid potential errors. Co-authored-by: mingyue <[email protected]> * Update onnxruntime_c_api.h to work with MinGW (#24006) ### Description Same as #23169 ### Motivation and Context Same as #23169 * Add DNNL github workflow (#24011) ### Description Add DNNL github workflow which is migrated from "Windows CPU CI pipeline" from Azure DevOps. This PR also adds "--build_nuget" to test the C# part. However, then I hit an error when building the tests in "test\Microsoft.ML.OnnxRuntime.Tests.NetCoreApp\Microsoft.ML.OnnxRuntime.Tests.NetCoreApp.csproj". The error message was: ``` D:\a\_work\onnxruntime\onnxruntime\csharp\test\Microsoft.ML.OnnxRuntime.Tests.Common\TrainingTest.cs(34,81): error CS0103: The name 'CheckpointState' does not exist in the current context [D:\a\_work\onnxruntime\onnxruntime\csharp\test\Microsoft.ML.OnnxRuntime.Tests.NetCoreApp\Microsoft.ML.OnnxRuntime.Tests.NetCoreApp.csproj] ``` Then I checked the code. I couldn't understand how it worked before. In this build, `__TRAINING_ENABLED_NATIVE_BUILD__` is not defined. But the "CheckpointState" class is defined in https://github.com/microsoft/onnxruntime/blob/main/csharp/src/Microsoft.ML.OnnxRuntime/Training/CheckpointState.shared.cs#L21 And the file is empty when __TRAINING_ENABLED_NATIVE_BUILD__ is not defined. So I don't understand how it could work in a normal build without dnnl. Here is my build command: ``` python tools\ci_build\build.py --config RelWithDebInfo --build_dir dnnlbuild --skip_submodule_sync --build_csharp --parallel --use_binskim_compliant_compile_flags --cmake_generator "Visual Studio 17 2022" --build_shared_lib --enable_onnx_tests --build_wheel --msbuild_extra_options "IncludeMobileTargets=false" --build_nuget --use_vcpkg --use_vcpkg_ms_internal_asset_cache --use_dnnl ``` This PR removes the failed test. * Qnn weight sharing improvement (#23945) ### Description Qnn weight sharing improvement so that only the last session in the weight sharing group (the session that has both share_ep_contexts and stop_share_ep_contexts enabled) generates the .bin file. The .bin file name is decided from the 1st session. And all generated *_ctx.onnx models point to this single .bin to avoid post-processing work. Previously each session generates a _ctx.onnx model with a .bin file. So it requires post-processing work to go through generated *_ctx.onnx models to get the last generated *_ctx.bin file and update all *_ctx.onnx to point to the same .bin file and remove the .bin files not used. * Correct generated cmake syntax (#24016) ### Description Previously will got CMake Error at build/Android/intermediates/armeabi-v7a/vcpkg/buildtrees/0.vcpkg_dep_info.cmake:15: Parse error. Expected a newline, got identifier with text "set". * [webgpu] allow to specify UseIndicesTypeAlias for Indices (#24019) ### Description Allow to specify `UseIndicesTypeAlias` for `AddIndices` in `ShaderHelper`. * [webgpu] allow overloads to Program::AddIndices (#24021) ### Description This change allows more overloads for the `Program::AddIndices` method, and makes use of r-value references for parameters when possible. Also fixed the implementation of the `AddInputs` and `AddOutputs` methods to use r-value references for the parameters * fix test for RotaryEmbedding (#24022) ### Description the `BaseTester::Run` function signature is: ```c++ void BaseTester::Run(ExpectResult expect_result, const std::string& expected_failure_string, const std::unordered_set<std::string>& excluded_provider_types, const RunOptions* run_options, std::vector<std::unique_ptr<IExecutionProvider>>* execution_providers, ExecutionMode execution_mode, const Graph::ResolveOptions& options); ``` Its behavior is: - if the parameter `execution_providers` is empty, it will try to aggregate all execution providers available in the build, and for each EP, create inference session and perform test. - if the parameter `execution_providers` is not empty, it will run a single inference session, use the passed-in `execution_providers` as session options and perform test. The old code may put multiple EPs into single inference sessions, but at runtime there will be only one EP running the test. Specifically, WebGPU EP is after CPU EP in this case, so the test never run on WebGPU EP. **To reviewers**: if you see **a lot of** changes, click the "setting" button next to the "Jump to", <img width="277" alt="image" src="https://github.com/user-attachments/assets/e8947ffb-f230-4c59-a5b7-36c0aedd2b7c" /> and check the "Hide Whitespace" and load it again. <img width="137" alt="{4D60F676-35F4-4546-B8E1-E2F42411A9E6}" src="https://github.com/user-attachments/assets/f4c58e6e-c290-49f7-aca7-c413db1e3c77" /> * Fix attention bias broadcast (#24017) ### Description * Fix broadcast on attention bias dim 1. * Increase test cases in test_mha.py in pipeline to cover the testing. ### Motivation and Context This feature was added in https://github.com/microsoft/onnxruntime/pull/21710. There was bug when computing the offset when attention bias broadcast on dim 1 only in both CUDA and CPU kernel. It can be triggered when attention bias shape is like [batch_size, 1, sequence_length, total_sequence_length] and batch_size > 1 when unfused kernel is selected. Note that cudnn flash attention and cutlass fused attention also supports attention bias, so the bug in unfused kernel was not discovered previously. * Remove unused parameter in csharp InferenceTest (#24031) ### Description Fix a warning from analyzers: ``` Theory method 'CanRunInferenceOnAModelDotnetTensors' on test class 'InferenceTest' does not use parameter 'enableParallelExecution'. Use the parameter, or remove the parameter and associated data. (https://xunit.net/xunit.analyzers/rules/xUnit1026 ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [TensorRT EP] Call cudaSetDevice at compute function for handling multithreading scenario (#24010) The GPU device is set again at compute function/compute time to handle multithreading scenarios. Consider the following: Users can create multiple threads to initialize separate inference sessions on different devices (not just the default device 0) Later, additional threads may be spawned to execute inference_session.Run(), which calls this compute function. Since new threads default to using device 0, it’s necessary to explicitly set the correct device to ensure computations run on the intended GPU. Example code: ````python provider = [ [ ('TensorrtExecutionProvider', { 'device_id': 0, }), ], [ ('TensorrtExecutionProvider', { 'device_id': 1, }), ] ] class ThreadObj(): def __init__(self, model_path: str, iterations: int, idx: int): ... sess_opt = ort.SessionOptions() self.inference_session = ort.InferenceSession(model_path, sess_opt, provider[idx % 2]) def warmup(self): self.inference_session.run(None, self.input) def run(self, thread_times, threads_complete): for iter in range(self.iterations): self.inference_session.run(None, self.input) def thread_target(obj, thread_times, threads_complete): obj.run(thread_times, threads_complete) ... iterations = 500 num_threads = 13 t_obj_list = [] thread_list = [] for tidx in range(num_threads): obj = ThreadObj(model_path, iterations, tidx) t_obj_list.append(obj) obj.warmup() for t_obj in t_obj_list: thread = threading.Thread(target=thread_target, daemon=True, args=(t_obj,thread_times,threads_complete,)) thread.start() thread_list.append(thread) ... ```` Note: Based on our measurements (using cuda event) on the A100 GPU with CUDA 12, the execution time for `cudaSetDevice` is approximately 0.004 ms, which is negligible and does not impact runtime performance. * Increase timeout for ARM64-Xcode16-targeting-iphonesimulator (#24030) * Support tvOS build (#24000) * [TensorRT EP] Stop enforcing oss parser during Windows debug build (#24036) ### Description <!-- Describe your changes. --> Reverting as this issue disappeared after adapting newer TRT api. This has been validated by building ORT 1.20.1/1.21.0 debug build and testing on FRCNN/resnet50 models. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Set CMAKE_POLICY_DEFAULT_CMP0069 to NEW to ensure that IPO flags are added for dependencies. (#24034) Set CMAKE_POLICY_DEFAULT_CMP0069 to NEW to ensure that interprocedural optimization (IPO) flags are added for dependencies. If the OLD behavior is used, the IPO flags are only added for the Intel compiler on Linux. * Make Cuda packaging pipeline 1ES compliant (#23806) ### Description Make [Cuda packaging pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1287&_a=summary) 1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Check List - [x] pool `onnxruntime-Win-CPU-2022` not found * [webgpu/wasm] allow runtime switch between WebGPUEP and JSEP (#24032) ### Description Add `--webgpu-ep=runtime` to allow build ort-web with both WebGPUEP and JSEP, while at runtime use `globalThis.WEBGPU_EP` to switch between them. This change helps to do perf comparison between WebGPU EP and JSEP much easier. * Move call to MLAS_CPUIDINFO::GetCPUIDInfo() out of MlasSQNBitGemmDispatchNeon initialization. (#24018) Move call to `MLAS_CPUIDINFO::GetCPUIDInfo()` out of `MlasSQNBitGemmDispatchNeon` initialization. Reduce binary size when MatMulNBits op is not included in the build. I believe the side effect of `MLAS_CPUIDINFO::GetCPUIDInfo()` (e.g., initializing a static object) prevents the linker from discarding the code in a build where the associated MLAS functions are unused. * [webgpu] fix the wrong dispatch size in flash_attention (#24020) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Yulong Wang <[email protected]> * avoid copy unnecessary files for nodejs pkg (#23992) ### Description remove duplicated file in nodejs package. #23956 * Add support for custom position ids and attention bias to GQA CPU operator (#23944) ### Description - Added support for custom position ids and attention masks to the GQA CPU operator (fp32 and fp16) - Added MLAS eltwise add kernel for mask application for FP32 and FP16 - Added unit tests for the added eltwise add MLAS kernel - Modified python tests to test the new GQA inputs ### Motivation and Context Custom position ids and attention mask are required in order to implement speculative decoding in PhiSilica ### Benchmarks All the benchmarks are executed on the GQA op configuration which will be used in the PhiSilica speculative decoding secnario, and the configuration is as follows: - num_heads: 32 - kv_num_heads: 32 - do_rotary: 1 - local_window_size: -1 - head_size: 96 - sequence_length: 6 - packed_qkv: True Benchmarks were executed on Cadmus with Snapdragon(R) X 12-core X1E80100 @ 3.40 GHz In the tables below, column headers are total sequence length values used for benchmarking, and the row values are if the attention bias was used or not. Values are average inference time in ms over 100000 runs. #### Fp16 results | Total sequence length | 50 | 100 | 250 | 500 | 750 | 1000 | 1500 | 2000 | 2500 | 3000 | 3500 | 4000 | |:-----------------|:---------|:---------|:---------|:---------|:---------|:---------|:---------|:--------|:--------|:--------|:--------|:--------| | Without bias | 0.284054 | 0.257449 | 0.275806 | 0.334123 | 0.458324 | 0.614133 | 0.912791 | 1.38585 | 1.92186 | 2.39203 | 2.88808 | 3.46262 | | With bias | 0.250926 | 0.253072 | 0.279724 | 0.337774 | 0.499058 | 0.585388 | 0.914316 | 1.40701 | 1.87311 | 2.47475 | 3.3906 | 3.47474 | | Runtime increase | -11.66% | -1.7% | +1.42% | +1.09% | +8.89% | -4.68% | +0.17% | +1.53% | -2.54% | +3.46% | +17.4% | +0.35% | #### Fp32 results | Total sequence length | 50 | 100 | 250 | 500 | 750 | 1000 | 1500 | 2000 | 2500 | 3000 | 3500 | 4000 | |:-----------------|:---------|:---------|:---------|:---------|:---------|:---------|:--------|:--------|:--------|:--------|:--------|:--------| | Without bias | 0.259049 | 0.270541 | 0.304583 | 0.376708 | 0.554013 | 0.633217 | 1.20696 | 1.65985 | 1.95169 | 2.45807 | 3.05637 | 4.05169 | | With bias | 0.261631 | 0.268002 | 0.300853 | 0.370452 | 0.529865 | 0.735216 | 1.43493 | 1.4385 | 1.99028 | 2.3858 | 2.99425 | 4.80197 | | Runtime increase | +1.0% | -0.94% | -1.22% | -1.66% | -4.36% | +16.11% | +18.89% | -13.34% | +1.98% | -2.94% | -2.03% | +18.52% | --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * [WebNN] Better int64 integration (#23831) This PR adds some workarounds to enable int64 support for some WebNN backends which don't support int64 data type. - Do not fallback ops that are specifically due to the int64 limitation. - Convert all int64 initializer and input values to int32 and handle potential overflow errors. - Register all int64 model inputs and outputs as int32 ml-tensor. - Handle ONNX ops that need inputs or outputs conversion between int64 and int32. e.g. ArgMax, ArgMin, Cast, etc. - Convert int64 output data back to int32. - Disallow int64 outputs as 'ml-tensor' preferredOutputLocation. Fixed #21401 * Convert Windows GPU pipelines and Windows OpenVino pipeline to Github Actions (#24029) ### Description Convert Windows GPU pipelines and Windows OpenVino pipeline to Github Actions * [ARM CPU] Fix fp16 const initialization on no-fp16 platform (#23978) ### Description Fix fp16 const initialization on no-fp16 platform [such as Raspberry PI](https://github.com/microsoft/onnxruntime/issues/23957) ### Motivation and Context Resolve #23957 * [Native WebGPU EP] Add packedQKV and do_rotary attribute support to GroupQueryAttention operator (#23386) ### Description Add Packed QKV inputs and do_rotary attribute to GQA. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Packed QKV inputs and do_rotary attribute are required for certain models. * Whisper Redesigned Solution (#23549) ### Description This PR re-designs how Whisper is created and supported in ONNX Runtime. The new solution leverages [previous optimization work](https://github.com/microsoft/onnxruntime/pull/15473), and it is designed to be used in conjunction with [this work](https://github.com/microsoft/onnxruntime-genai/pull/1229) in ONNX Runtime GenAI. Some of the added changes include: - Re-designed export that creates new ONNX models without needing a `WhisperBeamSearch` op - Creates one encoder model that also pre-computes the cross-attention KV caches (since they only need to be calculated once) - Creates one decoder model that can be used during pre-fill and token generation - Creates one jump-times model that can be used for word-level timestamps - Removes need for a `WhisperBeamSearch` op to chain the encoder and decoder subgraphs - Removes need to duplicate decoder's weights in memory - Previous solution with the `WhisperBeamSearch` op created an encoder-decoder-init model and decoder-with-past model. The decoder was duplicated twice, one in each. - Removes need for separate logic to export the PyTorch model coming from OpenAI vs. the PyTorch model coming from Hugging Face - Re-factors common parameters and logic used in CPU and CUDA attention kernels - Adds `DUMP_STRING` to enable easy logging of intermediate information when running in debug mode to debug a problem. This info is not printed in release mode so it will not impact performance. - Integrates `DecoderMaskedMultiHeadAttention` into `MultiHeadAttention` - Enables past-present buffer sharing in the `MultiHeadAttention` op for improved performance - Adds `cache_indirection` and `past_sequence_length` as new optional inputs to `MultiHeadAttention` - Adds `output_qk` as new optional output to `MultiHeadAttention` - Enables calculating `output_qk` tensor with FP16 or FP32 precision, regardless of the model's precision - CI tests that run end-to-end across various flag combinations that are used by many customers internally and externally The existing solutions are still available if desired. ### Known Issues - The FP32 CPU model with the `WhisperBeamSearch` op and output QK is currently disabled. This is because ONNX Runtime doesn't currently support output QK kernels on CPU, only on CUDA. - The `DecoderMaskedMultiHeadAttention` CPU kernel has a parity mismatch with the `DecoderMaskedMultiHeadAttention` CUDA kernel. - Using `DecoderMaskedMultiHeadAttention` for the FP32 CPU model is not enabled. Currently, it uses `MultiHeadAttention` to avoid the parity mismatch issue. ### Motivation and Context Using the beam search op has made it more difficult to debug and fix errors that are encountered. This new approach is more flexible and more customizable for users (e.g. by running with ONNX Runtime GenAI). It also helps [this issue](https://github.com/microsoft/onnxruntime/issues/18216). --------- Co-authored-by: mindest <[email protected]> * Windows: Show more useful DLL load errors to say exactly what DLL is missing (#24053) ### Description When we fail to load a provider shared DLL in windows, the error is not very specific. Users have to figure out if the onnxruntime file is missing, a cuda file, or cudnn is not installed (and perhaps others). And this is just the cuda provider. It would be far more useful if it would say exactly what file is missing so the user can fix the actual problem. Plus, this will likely result in many fewer github issues regarding this problem, but if they do, they will be much easier to fix. This fix adds a function that will try loading a dll and its dependencies recursively to figure out which file is missing. It uses the OS dbghelp library to do it and is not very complex. This also fixes a many year old bug that was introduced in the change to use FormatMessage in env.cc, where the system error would always be an empty string `error 126 ""` due to passing 0 as the format buffer length. We will now see the more useful `The specified module could not be found.` style error messages. ### Motivation and Context Previously if we fail to load the cuda provider, the error would look like this, which is limited: `unknown file: error: C++ exception with description " onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll"` Now it will look like this if cudnn is not installed: `unknown file: error: C++ exception with description onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll" which depends on "cudnn64_9.dll" which is missing. (Error 126: "The specified module could not be found.")` If cuda is not installed: `unknown file: error: C++ exception with description onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll" which depends on "cudart64_12.dll" which is missing. (Error 126: "The specified module could not be found.")` And if onnxruntime_providers_cuda.dll is not installed: `unknown file: error: C++ exception with description onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading "C:\Example\Path\To\Library\onnxruntime_providers_cuda.dll" which is missing. (Error 126: "The specified module could not be found.") ` * Extend CMAKE_CUDA_FLAGS with all Blackwell compute capacity (#23928) ### Description <!-- Describe your changes. --> * Update range to build SASS on all arch and PTX on highest arch * when cuda>=12.8, build all arch (including latest blackwell) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list * [WebGPU] Reduce staging buffers for uploading intializers (#23968) This change reduces the number of staging buffers used for uploading initializers to the GPU. On the one hand, we early release the upload staging buffers. On the other hand, we use the BufferMapExtendedUsages feature of Dawn on UMA GPUs, which allows us to directly write into the dest GPU buffer without the need of a staging buffer. To achieve this, we need to ensure the UMA GPU buffers are mapped at creation. We have BufferManager to be awared of OnSessionInitializationEnd(), so that it can handle buffer Create() and Upload() calls properly. Credits to @fs-eire for the overall design of implementation. * [WebGPU EP] Implement Remaining Reduction Ops (#24045) ### Description <!-- Describe your changes. --> Adds naive implementations of ReduceMin, ReduceProd, ReduceL1, ReduceL2, ReduceLogSum, ReduceSumSquare, and ReduceLogSumExp. Will optimize to use shared memory in a later PR. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Increases WebGPU EP operator coverage. * add bool support to EPContext schema to unblock some models (#24065) ### Description add bool support to EPContext schema to unblock some models * [WebGPU EP] fix for reduce min/max error on MacOS CI (#24077) ### Error ```Traceback /onnxruntime/onnxruntime/core/providers/webgpu/reduction/reduction_ops.cc:146 [allow_multi_axes = true] Axes values must be in the range [-rank, rank-1]. Got: 446098880 ``` * Upgrade current MacOS-13 to 14 (#23293) ### Description Upgrade current MacOS-13 to 14 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - [x] Update the RN to 0.73.x+ to have the newer version of boost --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Fix CUDA EP Abs and Sign bfloat16 support (#23914) ### Description <!-- Describe your changes. --> Abs and Sign had bfloat16 kernels created but not registered with the CUDA EP. Additionally Sign bfloat16 didn't work. * register bfloat16 kernels with CUDA EP * fix incorrectly named macro by adding 'X' as they add bfloat16 registration * add specialization for bfloat16 to _Sign * copied existing pattern. not sure if there's a better way * update tests ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #23875 * Improve typing for OrtValue and other public Python interfaces (#24086) ### Description Improve the OrtValue interface typing and changed `staticmethod` to `classmethod` for constructors to follow python conventions (https://google.github.io/styleguide/pyguide.html#2174-decision). * [webgpu] Limit that K must be divisible by 128 to apply dp4a matmul (#24078) The DP4AMatMulQuantize shader needs to make sure that K is divisible by 128. Otherwise, we need align the scale to have shape [M, ceil(K / 128)]. To simplify the shader, we limit that K must be divisible by 128 to apply dp4a matmul. * Add macOS ARM64 pipeline for webgpu (#24060) ### Description Add macOS ARM64 pipeline for webgpu. This pipeline is a temporary one. I created this pipeline because the current code already fails on macOS ARM64 for WebGPU EP. Adding this pipeline allows to check the status of the fix, and eventually when the build passes, this pipeline will be merged with the existing macOS arm64 pipeline. * [WebNN/WebGPU JS] Fix shared Module methods overriding each other (#23998) - Renamed all conflicting WebNN methods from `jsep*` to `webnn*`. - WebNN doesn't need flush(), therefore it doesn't need to set `jsepBackend`. This PR addresses issue microsoft/webnn-developer-preview#78 * Enable multithreading on FP16 to FP32 cast operator (#23619) ### Description Enables multithreading on FP16 to FP32 cast operator. ### Motivation and Context Improves CPU performance on FP16 models that require casting to FP32. * Move Android CI Pipeline to Github Actions (#24094) ### Description Move Android CI Pipeline to Github Actions * Cleanup CoreML EP's code to remove COREML_ENABLE_MLPROGRAM (#23490) ### Description Cleanup CoreML EP's code to remove the COREML_ENABLE_MLPROGRAM macro. Also, increase MINIMUM_COREML_VERSION(first version we support) to 5 . * webgpu ep support for argmax/argmin (#24089) * [mobile/reactnative] Remove namespace from AndroidManifest.XML to resolve warning (#23847) ### Description Removes namespace from AndroidManifest.XML ### Motivation and Context - Resolves #21681 * [WebGPU EP] fix implementation of Pow (#24088) ### Description Use custom implementation for Pow to fix test failures. * Increase timeout to 90min for ARM64-Xcode16-targeting-iphonesimulator (#24091) ### Description <!-- Describe your changes. --> There are still some timeout for the pipeline. further extend the timeout to 90 minutes for ARM64-Xcode16-targeting-iphonesimulator. It takes quite a while if all build cache is missing. ### Motivation and Context The pipeline sometimes failed because of timeout. There is a previous PR #24030 to increase the timeout from 60min to 75 min but it looks like not enough. * [WebGPU] fix test failure in Reduce operators on macOS ARM64 (#24108) ### Description fix test failure in Reduce operators on macOS ARM64 ``` [E:onnxruntime:ReduceL1, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running ReduceL1 node. Name:'node1' Status Message: webgpu_context.cc:259 Run Uniform variable[0] (output_size) data type mismatch in program "ReduceL1", Expected: u32, Actual: i32 ``` * [WebGPU EP] Implements CumSum Operator (#24047) Increases WebGPU EP op coverage. * [webgpu] Use 1d dispatch group size (#24084) This PR uses 1d disptach group size and uses workgroup_idx instead of workgroup.x|workgroup.y in case they are normalized. * [WebGPU] fix test failure in MatMulNBits on macOS ARM64 (#24109) ### Description abs_error is slightly loosen from 0.02 to 0.03 to allow test cases on macOS arm64 to pass. * [QNN-EP] Add support for Sum operator with 2 inputs (#24098) ### Description <!-- Describe your changes. --> * Add Sum to op builder in QNN-EP * Now we can limit the support to Sum with 2 inputs. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Enhance QNN-EP support for Sum with two inputs * [WebNN] Replace narrow with SafeInt for consistently in integer handling (#24059) Remove redundant header files BTW. * [QNN-EP] Add Lora Support with offline QNN context binary (#24026) ### Description - Add the new run option called lora_config to feed the information from lora binary - Parse and apply the lora binary in OnRunStart ### Motivation and Context - Support Lora Adapter Binary with QNN Context Binary Usage * [TensorRT EP] support TensorRT 10.9-GA (#23905) ### Description <!-- Describe your changes. --> * Update to trt10.9 * oss parser tested (here's testing method https://onnxruntime.ai/docs/build/eps.html#note-to-ort-1210-open-sourced-parser-users) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [webgpu] Apply dp4a for generation shader (#24064) This pr applies DP4A to generation shader. And also support any block_size % 32 = 0. * [CUDA] Support slide window in cutlass fused attention (#24072) ### Description Add slide window support in cutlass fused attention ### Motivation and Context The change was previously created by Ye: https://github.com/microsoft/onnxruntime/pull/21926 I merged the change and resolved some conflictions. Also reversed some Ye's change in kernel_forward.h, so that our code is consistent with pytorch code. * [MIGraphX EP] rename HIPPinnedAllocator to MIGraphXPinnedAllocator (#24103) ### Description Rename class HIPPinnedAllocator to MIGraphXPinnedAllocator ### Motivation and Context To align allocators' naming for the MIGraphX EP * [MIGraphX EP] check POLICY CMP0144 availability before used (#24104) ### Description For a newer CMake, suppress warnings about incorrect letter cases in package names. ### Motivation and Context To avoid reporting for newer CMake that a package name contains capital letters when small letters are required. * [JSEP] handles edge case in gridsample operator (#24121) fix for https://github.com/microsoft/onnxruntime/issues/24070 * [OpenVINO]Session Options Appended After AppendExecutionProvider (#23852) Description To honor SessionOption API Contract the ordering of AddConfigOption and AppendExecutionProvider_OpenVINO should not matter. This PR is fixing that issue Motivation and Context This PR fixes a regression happened during last PR in ordering of SessionOptions. * [webgpu]Add MaxPool and AveragePool (#23714) This adds Max and Average pool operators for webgpu-native. Basically, this is a rewrite of the corresponding JSEP operators with some improvements: 1) 'dilations' support 2) Pooling with kernelShape.length > 2 for NHWC format 3) code cleanup However, there are still a few missing features: 1) ceil 'ceil_mode' 2) column major 'storage_order' 3) 'Indices' output for Max pools. * [webgpu EP] put GetMaxComponents and SumVector to one place. (#24122) ### Description put `GetMaxComponents` and `SumVector` to one place. fix a bug in `SumVector`: ```diff - return "(" + x + ".x + " + x + ".y + " + x + ".w + " + x + ".z" + ")"; + return "(" + x + ".x + " + x + ".y + " + x + ".z + " + x + ".w" + ")"; ``` * skip MOE python test when MPI is not installed (#24116) ### Description It is not common that dev machine have MPI installed. Skip the test if MPI is not installed. ### Motivation and Context Make it easy to run pytest in dev machine without the need to skip the test manually. * Integrate KleidiAI for MatMulNBits via MlasQNBitGemm (#23627) ### Description This PR integrates Arm® KleidiAI™ to provide optimized assembly kernels for matrix multiplication with 4-bit quantized weights. These changes target the MlasQNBitGemm functions, and can be utilized via the MatMulNBits operator. * add test cases for webgpu ep in web (#24117) ### Description This PR enables web tests (NPM suite tests) for WebGPU EP. There are some test failures expected, so the specific job is marked as "continueOnError". ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Refactor Webnn IsSupported*() to use constant initializers. (#24118) ### Description <!-- Describe your changes. --> This PR continues the work started at https://github.com/microsoft/onnxruntime/pull/19401. ### Motivation and Context An overridable initializer should not have a fixed value included in an WebNN model as it could be changed at runtime. The current check doesn't include validating that the initializer is constant. * Deleted the constant SKIP_CUDA_TEST_WITH_DML (#24113) ### Description Deleted the constant SKIP_CUDA_TEST_WITH_DML. It does not seem to be used anywhere. ### Motivation and Context The constant SKIP_CUDA_TEST_WITH_DML prohibits onnxruntime to be compiled when both of the flags -use_cuda and -use_dml are set. Co-authored-by: Andreas Hussing <[email protected]> * Update T5 Onnx Export and Optimization (#23949) Previously, the encoder onnx model adds extra initialization for decoder to generate kv cache from prompt. It is not necessary. Here we redesign onnx export for T5 model to output two separate models for encode and decoder. Move Linear that generates cross features based on encoder_hidden_states to encoder onnx model. In this way, the encoder does not need output encoder_hidden_states, and only need output the features for cross attention used in decoder. Major changes: -[x] update t5 onnx export script -[x] update convert_generation script -[x] update beam search to support changes of inputs and outputs (detail can be found below). -[x] add a tiny t5 mode…
zhaoxul-qti
pushed a commit
to CodeLinaro/onnxruntime
that referenced
this pull request
Apr 17, 2025
…idail is disabled in MacOS and iOS packaging stage due to (microsoft#24152) (microsoft#24153) NuGet_Packaging_CPU is broken due to similar issue from microsoft#23923 ### Description Migrate [Zip-Nuget Package Pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=940&_a=summary) to 1ES ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Check list - [x] Issue with onnxruntime-Win-CPU-2022 - [x] [Spot Bug](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=697830&view=logs&j=6c6a898f-bbbb-5c72-8695-82b606149fa2&t=433f102b-5ed3-5fed-87a0-6107744ce9b1&l=81)
TedThemistokleous
added a commit
to ROCm/onnxruntime
that referenced
this pull request
Jun 5, 2025
Sync to Official Microsoft/Onnxruntime:main release tag for ROCm 7.0 builds * [Shape Inference] Add shape inference for QLinearAdd and QLinearMul ops (#24090) ### Description <!-- Describe your changes. --> Support shape inference for QLinearAdd and QLinearMul ops which were missing in symbolic_shape_infer.py ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This change is required to enable shape inference for models with "QLinearAdd" ops which are defined in com.microsoft domain and the shapes of which cannot be inferred using onnx shape_inference alone. Fixes issue https://github.com/microsoft/onnxruntime/issues/24028 --------- Signed-off-by: Praveen G <[email protected]> * [mobile] Add Android NuGet BrowserStack test to NuGet packaging pipeline (#23580) ### Description Follow-up to #23551 Adds the BrowserStack testing stage for Android to the NuGet packaging pipeline. This test tests that the NuGet package produced will be imported and work correctly on an Android device [Pipeline run that shows what a failing unit test would look like](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=670961&view=results) --------- Co-authored-by: Edward Chen <[email protected]> * [CPU] Add fp16 support to sparse attention (#24015) ### Description Add fp16 support to sparse attention ### Motivation and Context Generalize models for CPU and GPU * refactor mac CI pipelines (#24138) ### Description This PR refactors the mac CI pipeline: - Use composite action and reusable workflow to put together duplicated code - separate each EP * Address Windows CUDA build issue (#24149) ### Description Create a separate template overloads to address Windows Debug build warning 'unreachable code'. * [webgpu] add option to perserve device and enable in unittest (#24115) ### Description This PR introduced a new WebGPU EP option `preserveDevice`. Before this change, a WebGPU device will be destroyed when no inference session uses it. The destroy of a WebGPU device will cleanup both buffer cache and shader cache. After this option is introduced, when the option is ON (default value is OFF), the device will no longer be destroyed and will be always keep alive. This is helpful in 2 scenarios: - A server that will be always on - unittest so that bugs of incorrect shader cache may be detected. (thanks to @qjia7 for the suggestion) * [js/web] allow bundler import condition for not bundling wasm (#24014) ### Description <!-- Describe your changes. --> This gives a way for webapp developers to customize the bundler behavior regarding whether to bundle the wasm. To avoid treating ort-wasm-threaded-simd.jsep.mjs and ort-wasm-threaded-simd.jsep.wasm as dependencies during the process of bundler build, use import condition `onnxruntime-web-use-extern-wasm`. For webpack: ``` module.exports = { //... resolve: { conditionNames: ['onnxruntime-web-use-extern-wasm', 'import', 'module'], }, }; ``` For esbuild: ``` await esbuild.build({ //... conditions: ['onnxruntime-web-use-extern-wasm', 'import', 'module'], }) ``` For rollup: ``` import { nodeResolve } from '@rollup/plugin-node-resolve'; export default { //... plugins: [nodeResolve({ exportConditions: ['onnxruntime-web-use-extern-wasm', 'import', 'module', 'development|production'] })] }; ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - #24009 * [js] Add API for accessing metadata of a model's input/output (#23937) ### Description Add API for accessing metadata of a model's input/output. Currently, The implementation is only applied to web assembly backend and nodejs binding. For webgl, there is so far no plan to implement this API; for react-native, the implementation will be done later and is not included in this PR. #### Example usage: ```js const mySession = await ort.InferenceSession.create( ... ); console.log(`there are ${mySession.inputMetadata.length} inputs:`); for (let i = 0; i < mySession.inputMetadata.length; i++) { let info; if (mySession.inputMetadata[i].isTensor) { info = `tensor: ${mySession.inputMetadata[i].type}, shape: ${mySession.inputMetadata[i].shape}`; } else { info = `non-tensor`; } console.log(`input ${i}: ${mySession.inputMetadata[i].name}: ${info}`); } ``` possible output: ``` there are 1 inputs: input 0: input: tensor: float32, shape: [batch, 3, 224, 224] ``` Resolves: - #22682 - #22949 * add cache "onnxnodetests" for node tests (#24150) ### Description add cache "onnxnodetests" for node tests This fixes the random download network error for onnx node tests data. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [Native WebGPU] Add Matmul (#24046) ### Description Add Native Matmul (`MatMulNaive`, `MatMulPacked` and `MatMulPackedVec4` ) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Upgrade Big Model pipeline CUDA from 11.8 to 12.x (#24156) ### Description Big model pipeline are still using cuda 11.8. This update the pipeline to use cuda 12.x. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Proper Error Message when fp16 model is used for Beam Search in CPU (#24151) ### Description Show proper error message when fp16 model is used for Beam Search in CPU. Before: ``` 2025-02-15 20:15:02.999160115 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running BeamSearch node. Name:'beam_search' Status Message: bad_function_call ``` After: ``` onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running BeamSearch node. Name:'beam_search' Status Message: onnxruntime/onnxruntime/contrib_ops/cpu/transformers/beam_search.cc:309 virtual onnxruntime::common::Status onnxruntime::contrib::transformers::BeamSearch::Compute(onnxruntime::OpKernelContext*) const BeamSearch does not support float16 model on CPU execution provider. Use float32 model or CUDA execution provider instead. ``` ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/23728 * Change type len from int to size_t (#24157) ### Description As titled. ### Motivation and Context We have the last MatMul in phi-4-mini onnx which is b_shape = {3072, 200064} packed_b_size = MlasGemmPackBSize(N, K); it is `3072*200064*sizeof(float)=2458386432` This is larger than 2,147,483,647, it is out of the int boundary on a 32-bit system. Then len is negative. So we change the type to size_t, and the model can be loaded successfully after the change. * Limit the Pipeline ability to build cuda 11 (#24073) ### Description Limit the Pipeline ability to build cuda 11. However, refernce to CUDA 11 is not complety removed in this PR. Will keep thme incase we decided to support both cuda 13 and cuda 12 in the future. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Move Linux CPU CI pipeline to Github Actions (#24154) ### Description Move the x64 part of "Linux CPU CI pipeline" to Github Actions * Bump vite from 6.2.1 to 6.2.3 in /js/web/test/e2e/exports/testcases/vite-default (#24167) Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 6.2.1 to 6.2.3. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vitejs/vite/releases">vite's releases</a>.</em></p> <blockquote> <h2>v6.2.3</h2> <p>Please refer to <a href="https://github.com/vitejs/vite/blob/v6.2.3/packages/vite/CHANGELOG.md">CHANGELOG.md</a> for details.</p> <h2>v6.2.2</h2> <p>Please refer to <a href="https://github.com/vitejs/vite/blob/v6.2.2/packages/vite/CHANGELOG.md">CHANGELOG.md</a> for details.</p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/vitejs/vite/blob/v6.2.3/packages/vite/CHANGELOG.md">vite's changelog</a>.</em></p> <blockquote> <h2><!-- raw HTML omitted -->6.2.3 (2025-03-24)<!-- raw HTML omitted --></h2> <ul> <li>fix: fs raw query with query separators (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19702">#19702</a>) (<a href="https://github.com/vitejs/vite/commit/f234b5744d8b74c95535a7b82cc88ed2144263c1">f234b57</a>), closes <a href="https://redirect.github.com/vitejs/vite/issues/19702">#19702</a></li> </ul> <h2><!-- raw HTML omitted -->6.2.2 (2025-03-14)<!-- raw HTML omitted --></h2> <ul> <li>fix: await client buildStart on top level buildStart (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19624">#19624</a>) (<a href="https://github.com/vitejs/vite/commit/b31faab2a81b839e4b747baeb9c7a7cbb724f8d2">b31faab</a>), closes <a href="https://redirect.github.com/vitejs/vite/issues/19624">#19624</a></li> <li>fix(css): inline css correctly for double quote use strict (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19590">#19590</a>) (<a href="https://github.com/vitejs/vite/commit/d0aa833296668fc420a27a1ea88ecdbdeacdbce7">d0aa833</a>), closes <a href="https://redirect.github.com/vitejs/vite/issues/19590">#19590</a></li> <li>fix(deps): update all non-major dependencies (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19613">#19613</a>) (<a href="https://github.com/vitejs/vite/commit/363d691b4995d72f26a14eb59ed88a9483b1f931">363d691</a>), closes <a href="https://redirect.github.com/vitejs/vite/issues/19613">#19613</a></li> <li>fix(indexHtml): ensure correct URL when querying module graph (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19601">#19601</a>) (<a href="https://github.com/vitejs/vite/commit/dc5395a27e44066ef7725278c4057d9f1071a53f">dc5395a</a>), closes <a href="https://redirect.github.com/vitejs/vite/issues/19601">#19601</a></li> <li>fix(preview): use preview https config, not server (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19633">#19633</a>) (<a href="https://github.com/vitejs/vite/commit/98b3160fa5916189e756cd7c5aae87e0d8f1978e">98b3160</a>), closes <a href="https://redirect.github.com/vitejs/vite/issues/19633">#19633</a></li> <li>fix(ssr): use optional chaining to prevent "undefined is not an object" happening in `ssrRewriteStac (<a href="https://github.com/vitejs/vite/commit/43097550a1aa8ff633c39fb197b5f9ac1222119b">4309755</a>), closes <a href="https://redirect.github.com/vitejs/vite/issues/19612">#19612</a></li> <li>feat: show friendly error for malformed <code>base</code> (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19616">#19616</a>) (<a href="https://github.com/vitejs/vite/commit/2476391b2854aaa67d0ed317b6d0c462e68028f7">2476391</a>), closes <a href="https://redirect.github.com/vitejs/vite/issues/19616">#19616</a></li> <li>feat(worker): show asset filename conflict warning (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19591">#19591</a>) (<a href="https://github.com/vitejs/vite/commit/367d968fbf584e9f0e17192b816e92e8045c6217">367d968</a>), closes <a href="https://redirect.github.com/vitejs/vite/issues/19591">#19591</a></li> <li>chore: extend commit hash correctly when ambigious with a non-commit object (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19600">#19600</a>) (<a href="https://github.com/vitejs/vite/commit/89a62873243805518b672212db7e317989c5c197">89a6287</a>), closes <a href="https://redirect.github.com/vitejs/vite/issues/19600">#19600</a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/vitejs/vite/commit/16869d7c9917eb58d9a0101e30064ab65e64fa91"><code>16869d7</code></a> release: v6.2.3</li> <li><a href="https://github.com/vitejs/vite/commit/f234b5744d8b74c95535a7b82cc88ed2144263c1"><code>f234b57</code></a> fix: fs raw query with query separators (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19702">#19702</a>)</li> <li><a href="https://github.com/vitejs/vite/commit/b12911edba0cd9edbad170a0940d37bb1e16ef2c"><code>b12911e</code></a> release: v6.2.2</li> <li><a href="https://github.com/vitejs/vite/commit/98b3160fa5916189e756cd7c5aae87e0d8f1978e"><code>98b3160</code></a> fix(preview): use preview https config, not server (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19633">#19633</a>)</li> <li><a href="https://github.com/vitejs/vite/commit/b31faab2a81b839e4b747baeb9c7a7cbb724f8d2"><code>b31faab</code></a> fix: await client buildStart on top level buildStart (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19624">#19624</a>)</li> <li><a href="https://github.com/vitejs/vite/commit/dc5395a27e44066ef7725278c4057d9f1071a53f"><code>dc5395a</code></a> fix(indexHtml): ensure correct URL when querying module graph (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19601">#19601</a>)</li> <li><a href="https://github.com/vitejs/vite/commit/2476391b2854aaa67d0ed317b6d0c462e68028f7"><code>2476391</code></a> feat: show friendly error for malformed <code>base</code> (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19616">#19616</a>)</li> <li><a href="https://github.com/vitejs/vite/commit/43097550a1aa8ff633c39fb197b5f9ac1222119b"><code>4309755</code></a> fix(ssr): use optional chaining to prevent "undefined is not an object" happe...</li> <li><a href="https://github.com/vitejs/vite/commit/363d691b4995d72f26a14eb59ed88a9483b1f931"><code>363d691</code></a> fix(deps): update all non-major dependencies (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19613">#19613</a>)</li> <li><a href="https://github.com/vitejs/vite/commit/d0aa833296668fc420a27a1ea88ecdbdeacdbce7"><code>d0aa833</code></a> fix(css): inline css correctly for double quote use strict (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19590">#19590</a>)</li> <li>Additional commits viewable in <a href="https://github.com/vitejs/vite/commits/v6.2.3/packages/vite">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [onnxruntime_perf_test] Fix custom_allocator_ destruction order. (#24136) Move the allocator data member declaration before the `Ort::Value` container data members that might use the allocator so that the `Ort::Value` containers will be destroyed first. `custom_allocator_` may be used as the allocator for the `Ort::Value`s in `test_inputs_` and `outputs_`. The allocator shouldn't be destroyed before `Ort::Value`s allocated with it are freed. * Fix layout transformer for FusedConv (#24169) ### Description Fix layout transformer for FusedConv. The current layout transformer will transform `FusedConv` (kMSDomain) into `FusedConv` (kMSInternalNHWCDomain) if the EP wants channels_last. However, kMSInternalNHWCDomain uses OpType `Conv` for both Conv and FusedConv, so `FusedConv` (kMSInternalNHWCDomain) is invalid (unregistered op). This PR fixes this and allows layout transformer change `FusedConv` (kMSDomain) into `Conv` (kMSInternalNHWCDomain). ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Migrate Zip-Nuget Package Pipeline to 1ES (#23609) Also, kleidail is disabled in MacOS and iOS packaging stage due to (#24152) (#24153) NuGet_Packaging_CPU is broken due to similar issue from #23923 ### Description Migrate [Zip-Nuget Package Pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=940&_a=summary) to 1ES ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Check list - [x] Issue with onnxruntime-Win-CPU-2022 - [x] [Spot Bug](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=697830&view=logs&j=6c6a898f-bbbb-5c72-8695-82b606149fa2&t=433f102b-5ed3-5fed-87a0-6107744ce9b1&l=81) * Update the min GCC version (#24148) ### Description Update the min supported GCC version to 11.1. ### Motivation and Context In order to utilize new CPU instructions, we need to use new compilers. For example, our MLAS code needs bfloat16 support for arm, which requires GCC version >=10. And some other code requires GCC version >=11.1. Also, our CI pipelines only tests the code with GCC 11,12 and 14. Therefore this PR increase the min GCC version to 11.1. Will update it to 12 once we deprecate CUDA 11 pipelines * [QNN EP] ARM64EC python package remove --vcpkg in build (#24174) --use_vcpkg option seems to be causing problems for --arm64ec python packages (onnxruntime-qnn) session creation crashes for packages built with --use_vcpkg. the released onnxruntime-qnn 1.21.0 python wheel for x64 (arm64ec) has this issue. removing --use_vcpkg while the issue is debugged in parallel. we plan to release a 1.21.1 onnxruntime-qnn x64 python wheel without --use_vcpkg to address the crash. https://github.com/microsoft/onnxruntime/issues/24082 * [WebGPU EP] Add GEMM implementation (#24023) Increases operator GEMM for WebGPU ep. --------- Co-authored-by: Xiaofei Han <[email protected]> Co-authored-by: Yulong Wang <[email protected]> * [wasm] remove --vcpkg in wasm build (#24179) ### Description There are slightly mismatch for the build flags for Web build pipeline when using vcpkg. A [fix](https://github.com/microsoft/onnxruntime/pull/24012) is on the way but for now we need to disable vcpkg for the next patch release. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * revise mac os pipeline to reduce the amount of jobs (#24177) ### Description - remove x86_64/Debug build in the matrix to reduce the amount of jobs - set max-parallel to 1 to avoid big backlogs (single PR will take longer but less traffic in the pipeine) * fix triggering for "Validate Gradle Wrapper" pipeline (#24181) ### Description currently it is triggered on every branch. * upgrade QNN to version 2.32.0.250228 (#23977) ### Description upgrade QNN to latest version 2.32.0.250228 * [JSEP] adjust edge case logic for scatternd (#24172) Fixes https://github.com/microsoft/onnxruntime/issues/24070 by explicitly restricting single-threaded, sequential execution in the case where `reduction=none && hasDuplicates`. * Make the custom nuget packaging pipeline 1ES commpliant. (#24191) * Disable KleidiAI in Python Packaging pipeline MacOS build (#24194) This is a workaround for a build error. See https://github.com/microsoft/onnxruntime/issues/24152. * Rolling back the python/cuda (#24170) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Remove all CG template from pipelines (#24193) ### Description Since we are adapting 1ES teamplate, we are remove the redundent CG steps from our pipelines ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Move Linux ARM64 CI pipeline and Linux DNNL CI pipeline to Github Actions (#24190) ### Description 1. Move Linux ARM64 CI pipeline and Linux DNNL CI pipeline to Github Actions 2. Refactor .github/workflows/linux_training.yml to use a template ### Motivation and Context * [webgpu-ep] Fix test_batchnorm_example (#24184) This fixes the missing component handling for the input and output variables in BatchNorm operator. * Further reduce work load for Mac CI pipeline (#24197) ### Description Further reduce work load for Mac CI pipeline ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Generate unique names for SliceSplit fusion. (#24217) ### Description Generate unique name for fused Split nodes. ### Motivation and Context The bug is manifested when the model features more than one Slice to Split fusion patterns and the nodes of the graph are nameless. This addresses https://github.com/microsoft/onnxruntime/issues/24203. * Fix the pipeline that failed because of vcpkg (#24226) ### Description - Pin VCPKG version for Github Actions pipelines - Update NDK to 28 because cmake 4.0 dropped the support for NDK 27. - Disable vcpkg temporarily for 2 ADO pipelines. * Improve Shape Inference for GQA (#24143) ### Description <!-- Describe your changes. --> For GroupQueryAttention op, if the input total_sequence_length is a constant, we can infer the shape of output present_key/present_value `(batch_size, kv_num_heads, present_sequence_length, head_size)`. https://github.com/microsoft/onnxruntime/blob/5ed900e9712ce2f02e40c15b945d18453d1960d8/onnxruntime/contrib_ops/cpu/bert/group_query_attention_helper.h#L185 We know that from CPU EP, `present_sequence_length = max(past_sequence_length, total_sequence_length)`, and `batch_size, kv_num_heads, head_size` are the same as past_key/past_value. This inference is very important for WebNN EP, because WebNN only supports GQA for `present_sequence_length == past_sequence_length` and requires static shape for graph compilation. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Add React Native namespace back in for iOS (#24218) ### Description - adds react native namespace back in to the androidmanifest.xml ### Motivation and Context - reverses [this commit](https://github.com/microsoft/onnxruntime/commit/d8ed4da1dfee781919247d9ce001f246489c8f90) - missed [this comment](https://github.com/microsoft/onnxruntime/blob/2656671064a83564ddf5766f3449c2406259c3ef/js/react_native/android/build.gradle#L141) that explains that androidmanifest.xml is used for iOS while androidmanifestnew.xml is used for android * RoPE fp16 avx (#23772) ### Description RoPE to work with fp16 data types ### Motivation and Context this is need to improve GQA --------- Signed-off-by: liqunfu <[email protected]> Signed-off-by: Liqun Fu <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Migrate Linux GPU pipelines to Github Actions (#24232) * Migrate Web CI into github actions (#24219) ### Description This PR migrates the Web CI into github actions. * update the readme doc for the tool ep_weight_sharing_ctx_gen (#24233) ### Description update the readme to remove something Qnn specific for the tool ep_weight_sharing_ctx_gen --------- Co-authored-by: Edward Chen <[email protected]> * [WebGPU EP] If Implementation for WebGPU EP (#24242) Increases operator covereage for WebGPU EP. * Update linux-dnnl.yml: rename the pipeline (#24240) Update linux-dnnl.yml: rename the pipeline to Linux DNNL CI * [webgpu] Fix test_layer_normalization_2d_axis0 (#24223) The optional 'Mean' and 'InvStdDev' outputs of the LayerNormalization were not implemented. --------- Co-authored-by: Yulong Wang <[email protected]> * [webgpu] fix LayerNorm with empty input (#24244) ### Description This PR fixes test case `CudaKernelTest.LayerNorm_NullInput`, in which the input is 0-sized for LayerNorm. `context.Output()` need to be called before returning. * Bump actions/setup-python from 4 to 5 (#24251) * Bump actions/cache from 3 to 4 (#24250) Bumps [actions/cache](https://github.com/actions/cache) from 3 to 4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/cache/releases">actions/cache's releases</a>.</em></p> <blockquote> <h2>v4.0.0</h2> <h2>What's Changed</h2> <ul> <li>Update action to node20 by <a href="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/takost"><code>@takost</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1284">actions/cache#1284</a></li> <li>feat: save-always flag by <a href="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/to-s"><code>@to-s</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1242">actions/cache#1242</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="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/takost"><code>@takost</code></a> made their first contribution in <a href="https://redirect.github.com/actions/cache/pull/1284">actions/cache#1284</a></li> <li><a href="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/to-s"><code>@to-s</code></a> made their first contribution in <a href="https://redirect.github.com/actions/cache/pull/1242">actions/cache#1242</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/cache/compare/v3...v4.0.0">https://github.com/actions/cache/compare/v3...v4.0.0</a></p> <h2>v3.4.3</h2> <h2>What's Changed</h2> <ul> <li>Bump <code>@actions/cache</code> to v4.0.2 by <a href="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/robherley"><code>@robherley</code></a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/cache/compare/v3.4.2...v3.4.3">https://github.com/actions/cache/compare/v3.4.2...v3.4.3</a></p> <h2>v3.4.2</h2> <h2>What's Changed</h2> <blockquote> <p>[!IMPORTANT] As a reminder, there were important backend changes to release v3.4.0, see <a href="https://github.com/actions/cache/releases/tag/v3.4.0">those release notes</a> and <a href="https://github.com/actions/cache/discussions/1510">the announcement</a> for more details.</p> </blockquote> <ul> <li>Bump <code>@actions/cache</code> to v4.0.1 by <a href="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/robherley"><code>@robherley</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1554">actions/cache#1554</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/cache/compare/v3.4.0...v3.4.2">https://github.com/actions/cache/compare/v3.4.0...v3.4.2</a></p> <h2>v3.4.1</h2> <blockquote> <p>[!WARNING] This version was incorrectly released using a SHA pointing to a newer version for <em><a href="https://redirect.github.com/github/roadmap/issues/592">immutable actions</a> only</em>. Please use <code>v3.4.2</code> (or <code>v3</code>) instead.</p> </blockquote> <h2>v3.4.0</h2> <h2>⚠️ Important Changes</h2> <p>The cache backend service has been rewritten from the ground up for improved performance and reliability. <a href="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/actions/cache">actions/cache</a> now integrates with the new cache service (v2) APIs.</p> <p>The new service will gradually roll out as of <strong>February 1st, 2025</strong>. The legacy service will also be sunset on the same date. Changes in these release are <strong>fully backward compatible</strong>.</p> <p><strong>We are deprecating some versions of this action</strong>. We recommend upgrading to version <code>v4</code> or <code>v3</code> as soon as possible before <strong>February 1st, 2025.</strong> (Upgrade instructions below).</p> <p>If you are using pinned SHAs, please use the SHAs of versions <code>v4.2.0</code> or <code>v3.4.0</code></p> <p>If you do not upgrade, all workflow runs using any of the deprecated <a href="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/actions/cache">actions/cache</a> will fail.</p> <p>Upgrading to the recommended versions will not break your workflows.</p> <p>Read more about the change & access the migration guide: <a href="https://github.com/actions/cache/discussions/1510">reference to the announcement</a>.</p> <h3>Minor changes</h3> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/actions/cache/blob/main/RELEASES.md">actions/cache's changelog</a>.</em></p> <blockquote> <h1>Releases</h1> <h3>4.2.3</h3> <ul> <li>Bump <code>@actions/cache</code> to v4.0.3 (obfuscates SAS token in debug logs for cache entries)</li> </ul> <h3>4.2.2</h3> <ul> <li>Bump <code>@actions/cache</code> to v4.0.2</li> </ul> <h3>4.2.1</h3> <ul> <li>Bump <code>@actions/cache</code> to v4.0.1</li> </ul> <h3>4.2.0</h3> <p>TLDR; The cache backend service has been rewritten from the ground up for improved performance and reliability. <a href="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/actions/cache">actions/cache</a> now integrates with the new cache service (v2) APIs.</p> <p>The new service will gradually roll out as of <strong>February 1st, 2025</strong>. The legacy service will also be sunset on the same date. Changes in these release are <strong>fully backward compatible</strong>.</p> <p><strong>We are deprecating some versions of this action</strong>. We recommend upgrading to version <code>v4</code> or <code>v3</code> as soon as possible before <strong>February 1st, 2025.</strong> (Upgrade instructions below).</p> <p>If you are using pinned SHAs, please use the SHAs of versions <code>v4.2.0</code> or <code>v3.4.0</code></p> <p>If you do not upgrade, all workflow runs using any of the deprecated <a href="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/actions/cache">actions/cache</a> will fail.</p> <p>Upgrading to the recommended versions will not break your workflows.</p> <h3>4.1.2</h3> <ul> <li>Add GitHub Enterprise Cloud instances hostname filters to inform API endpoint choices - <a href="https://redirect.github.com/actions/cache/pull/1474">#1474</a></li> <li>Security fix: Bump braces from 3.0.2 to 3.0.3 - <a href="https://redirect.github.com/actions/cache/pull/1475">#1475</a></li> </ul> <h3>4.1.1</h3> <ul> <li>Restore original behavior of <code>cache-hit</code> output - <a href="https://redirect.github.com/actions/cache/pull/1467">#1467</a></li> </ul> <h3>4.1.0</h3> <ul> <li>Ensure <code>cache-hit</code> output is set when a cache is missed - <a href="https://redirect.github.com/actions/cache/pull/1404">#1404</a></li> <li>Deprecate <code>save-always</code> input - <a href="https://redirect.github.com/actions/cache/pull/1452">#1452</a></li> </ul> <h3>4.0.2</h3> <ul> <li>Fixed restore <code>fail-on-cache-miss</code> not working.</li> </ul> <h3>4.0.1</h3> <ul> <li>Updated <code>isGhes</code> check</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/actions/cache/commit/5a3ec84eff668545956fd18022155c47e93e2684"><code>5a3ec84</code></a> Merge pull request <a href="https://redirect.github.com/actions/cache/issues/1577">#1577</a> from salmanmkc/salmanmkc/4-test</li> <li><a href="https://github.com/actions/cache/commit/7de21022a7b6824c106a9847befcbd8154b45b6a"><code>7de2102</code></a> Update releases.md</li> <li><a href="https://github.com/actions/cache/commit/76d40dd347779762a1c829bbeeda5da4d81ca8c1"><code>76d40dd</code></a> Update to use the latest version of the cache package to obfuscate the SAS</li> <li><a href="https://github.com/actions/cache/commit/76dd5eb692f606c28d4b7a4ea7cfdffc926ba06a"><code>76dd5eb</code></a> update cache with main</li> <li><a href="https://github.com/actions/cache/commit/8c80c27c5e4498d5675b05fb1eff96a56c593b06"><code>8c80c27</code></a> new package</li> <li><a href="https://github.com/actions/cache/commit/45cfd0e7fffd1869ea4d5bfb54a464d825c1f742"><code>45cfd0e</code></a> updates</li> <li><a href="https://github.com/actions/cache/commit/edd449b9cf39c2a20dc7c3d505ff6dc193c48a02"><code>edd449b</code></a> updated cache with latest changes</li> <li><a href="https://github.com/actions/cache/commit/0576707e373f92196b81695442ed3f80c347f9c7"><code>0576707</code></a> latest test before pr</li> <li><a href="https://github.com/actions/cache/commit/3105dc9754dd9cd935ffcf45c091ed2cadbf42b9"><code>3105dc9</code></a> update</li> <li><a href="https://github.com/actions/cache/commit/9450d42d15022999ad2fa60a8b91f01fc92a0563"><code>9450d42</code></a> mask</li> <li>Additional commits viewable in <a href="https://github.com/actions/cache/compare/v3...v4">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) Dependabot will merge this PR once it's up-to-date and CI passes on it, as requested by @fs-eire. [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [QNN EP] Add platform-agnostic EP option to specify QNN backend, `backend_type` (#24235) Add a platform-agnostic EP option to specify QNN backend, `backend_type`. In typical usage, this should supersede the `backend_path` EP option. `backend_path` requires specifying a path to the QNN backend library which is different between Windows and non-Windows platforms (e.g., QnnCpu.dll vs. libQnnCpu.so). It will not be removed for backwards compatibility. It also provides the flexibility to specify an arbitrary backend path. * [webgpu] Fix opset-12 softmax nhwc issue (#24227) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Extend pyright exclude list in pyproject.toml (#24246) ### Description Improve the performance of file glob for pyright. This helps to improve VSCode performance (if pyright plugin is installed) * [js/web] Add Wasm Relaxed SIMD support to wasm backend (#22794) ### Description <!-- Describe your changes. --> Add Wasm Relaxed SIMD support. Use integer dot product instructions for QGemmU8X8. 1. Build with --enable_wasm_relaxed_simd 2. Use env.wasm.relaxedSimd to run it ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/22533 --------- Co-authored-by: Yulong Wang <[email protected]> * Add shader key validation step in WebGPU CI pipeline (#24243) ### Description This PR adds a shader key validation step to the WebGPU CI pipeline. The shader key validation works in this way: - first, run onnxruntime_test_all with verbose logging, dumping the logs into a file - then, parse the file and found WebGPU EP program logs. The log contains the following information: - the shader cache key - the corresponding shader code The script will aggregate those information and make sure for each cache key, the corresponding shader code must be consistent. To make the validation work, this PR also modified a few things: - set the locale of `std::wclog` to ".UTF-8" to support Unicode characters. Otherwise the logger will fail and no longer output future logs. A fix is submitted in PR #24237 but there is a concern if this may potentially break some users. Setting inside onnxruntime_test_all is pretty safe. - re-enable the WebGPU device auto collect which was introduced in https://github.com/microsoft/onnxruntime/pull/24115. Now we have a better way to detect cache key inconsistency. ### Next Step The newly added test is marked as `continue-on-error: true`, which means even if it failed it does not block the CI pipeline. We should fix those failures one-by-one and eventually the test should pass. then we can remove the `continue-on-error: true` flag. * upgrade dawn version to 4cb1f9be152a4fa6bb695c08cd707ab078a1e2fb (#24247) ### Description Bump version of Dawn to 4cb1f9be152a4fa6bb695c08cd707ab078a1e2fb. ### Changes to the patches to Dawn: Removed patches because they are already merged into upstream or resolved in a different way: - (public) CMake fix to support Emscripten v4.0.3+ - (private) Fix external ref count for "external" device in emwgpu C++ implementation - (private) Allow "external" buffer in emwgpu C++ implementation Keep unchanged patches: - (private) Remove hard-coded CMAKE_OSX_DEPLOYMENT_TARGET in Dawn's CMake files Rewritten patches: - (public) Fix emwgpu C++ implementation for buffer destroy ### Corresponding changes in ORT - Dawn API changes - follow changes to `wgpu::Limits` - remove the usage of `DAWN_EMSCRIPTEN_TOOLCHAIN` - use `wgpu::InstanceDescriptor` in `wgpu::Instance` creation in WASM since it is supported now. * Bump dsaltares/fetch-gh-release-asset from 1.1.0 to 1.1.2 (#24248) Bumps [dsaltares/fetch-gh-release-asset](https://github.com/dsaltares/fetch-gh-release-asset) from 1.1.0 to 1.1.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/dsaltares/fetch-gh-release-asset/releases">dsaltares/fetch-gh-release-asset's releases</a>.</em></p> <blockquote> <h2>1.1.2</h2> <h2>What's Changed</h2> <ul> <li>feat: support unauthenticated requests by <a href="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/maciekmm"><code>@maciekmm</code></a> in <a href="https://redirect.github.com/dsaltares/fetch-gh-release-asset/pull/59">dsaltares/fetch-gh-release-asset#59</a></li> <li>fix: 61 - upgrade to node 20 by <a href="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/dsaltares"><code>@dsaltares</code></a> in <a href="https://redirect.github.com/dsaltares/fetch-gh-release-asset/pull/63">dsaltares/fetch-gh-release-asset#63</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="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/maciekmm"><code>@maciekmm</code></a> made their first contribution in <a href="https://redirect.github.com/dsaltares/fetch-gh-release-asset/pull/59">dsaltares/fetch-gh-release-asset#59</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/dsaltares/fetch-gh-release-asset/compare/1.1.1...1.1.2">https://github.com/dsaltares/fetch-gh-release-asset/compare/1.1.1...1.1.2</a></p> <h2>1.1.1</h2> <h2>What's Changed</h2> <ul> <li>fix: 50 - actually default version to latest by <a href="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/dsaltares"><code>@dsaltares</code></a> in <a href="https://redirect.github.com/dsaltares/fetch-gh-release-asset/pull/56">dsaltares/fetch-gh-release-asset#56</a></li> <li>Bump json5 from 1.0.1 to 1.0.2 by <a href="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/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/dsaltares/fetch-gh-release-asset/pull/55">dsaltares/fetch-gh-release-asset#55</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="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/dependabot"><code>@dependabot</code></a> made their first contribution in <a href="https://redirect.github.com/dsaltares/fetch-gh-release-asset/pull/55">dsaltares/fetch-gh-release-asset#55</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/dsaltares/fetch-gh-release-asset/compare/1.1.0...1.1.1">https://github.com/dsaltares/fetch-gh-release-asset/compare/1.1.0...1.1.1</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/dsaltares/fetch-gh-release-asset/commit/aa2ab1243d6e0d5b405b973c89fa4d06a2d0fff7"><code>aa2ab12</code></a> fix: 61 - upgrade to node 20 (<a href="https://redirect.github.com/dsaltares/fetch-gh-release-asset/issues/63">#63</a>)</li> <li><a href="https://github.com/dsaltares/fetch-gh-release-asset/commit/cdaf216b2a5baa0f20eecbf460912cc9947f2577"><code>cdaf216</code></a> feat: support unauthenticated requests (<a href="https://redirect.github.com/dsaltares/fetch-gh-release-asset/issues/59">#59</a>)</li> <li><a href="https://github.com/dsaltares/fetch-gh-release-asset/commit/5d24fa77c1ae2e1e1dea54677d267f127d5de53a"><code>5d24fa7</code></a> chore: remove support notice</li> <li><a href="https://github.com/dsaltares/fetch-gh-release-asset/commit/a40c8b4a0471f9ab81bdf73a010f74cc51476ad4"><code>a40c8b4</code></a> Bump json5 from 1.0.1 to 1.0.2 (<a href="https://redirect.github.com/dsaltares/fetch-gh-release-asset/issues/55">#55</a>)</li> <li><a href="https://github.com/dsaltares/fetch-gh-release-asset/commit/5a71312bcb7a436e89a7dd26123cdbdd7b3df709"><code>5a71312</code></a> fix: 50 - actually default version to latest (<a href="https://redirect.github.com/dsaltares/fetch-gh-release-asset/issues/56">#56</a>)</li> <li>See full diff in <a href="https://github.com/dsaltares/fetch-gh-release-asset/compare/1.1.0...1.1.2">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump vite from 6.2.3 to 6.2.4 in /js/web/test/e2e/exports/testcases/vite-default (#24255) Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 6.2.3 to 6.2.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vitejs/vite/releases">vite's releases</a>.</em></p> <blockquote> <h2>v6.2.4</h2> <p>Please refer to <a href="https://github.com/vitejs/vite/blob/v6.2.4/packages/vite/CHANGELOG.md">CHANGELOG.md</a> for details.</p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/vitejs/vite/blob/v6.2.4/packages/vite/CHANGELOG.md">vite's changelog</a>.</em></p> <blockquote> <h2><!-- raw HTML omitted -->6.2.4 (2025-03-31)<!-- raw HTML omitted --></h2> <ul> <li>fix: fs check in transform middleware (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19761">#19761</a>) (<a href="https://github.com/vitejs/vite/commit/7a4fabab6a3aa24c89144e15a13d78f92b52e588">7a4faba</a>), closes <a href="https://redirect.github.com/vitejs/vite/issues/19761">#19761</a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/vitejs/vite/commit/037f801075ec35bb6e52145d659f71a23813c48f"><code>037f801</code></a> release: v6.2.4</li> <li><a href="https://github.com/vitejs/vite/commit/7a4fabab6a3aa24c89144e15a13d78f92b52e588"><code>7a4faba</code></a> fix: fs check in transform middleware (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19761">#19761</a>)</li> <li>See full diff in <a href="https://github.com/vitejs/vite/commits/v6.2.4/packages/vite">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [WebGPU EP] fixes bugs in split implementation (#24259) Fixes the following errors: ``` [ONNXRuntimeError] : 1 : FAIL : WebGPU validation failed. Error while parsing WGSL: :48:1 error: unexpected token } ^ - While validating [ShaderModuleDescriptor] - While calling [Device].CreateShaderModule([ShaderModuleDescriptor]). ``` ``` [E:onnxruntime:sam, sequential_executor.cc:572 onnxruntime::ExecuteKernel] Non-zero status code returned while running Split node. Name:'/Split_1' Status Message: WebGPU validation failed. Error while parsing WGSL: :62:14 error: cannot index type 'u32' index -= uniforms.sizes_in_split_axis[output_number - 1u]; ``` * Bump microsoft/onnxruntime-github-actions from 35f8bd42417991aa46577e9c32e445af4250f098 to f3d90afe522476c858909e0de2be0b12bc890068 (#24249) Bumps [microsoft/onnxruntime-github-actions](https://github.com/microsoft/onnxruntime-github-actions) from 35f8bd42417991aa46577e9c32e445af4250f098 to f3d90afe522476c858909e0de2be0b12bc890068. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/microsoft/onnxruntime-github-actions/commit/f3d90afe522476c858909e0de2be0b12bc890068"><code>f3d90af</code></a> update</li> <li><a href="https://github.com/microsoft/onnxruntime-github-actions/commit/fe4bffdebbaf16477883ba661ecfeeeb5703c85a"><code>fe4bffd</code></a> update</li> <li><a href="https://github.com/microsoft/onnxruntime-github-actions/commit/2cf46f409099e5a27977bbece1c89f3d6dca6a1b"><code>2cf46f4</code></a> update</li> <li><a href="https://github.com/microsoft/onnxruntime-github-actions/commit/bb6b16e409684ffd0f46b35e8e217ce6ed72097c"><code>bb6b16e</code></a> update</li> <li><a href="https://github.com/microsoft/onnxruntime-github-actions/commit/0c8c2ab4b6ca3be3c8287a0e0549038ecca38d7d"><code>0c8c2ab</code></a> update</li> <li><a href="https://github.com/microsoft/onnxruntime-github-actions/commit/f861fd3c0d13dedcf2fae39ad7023acaad97532d"><code>f861fd3</code></a> update</li> <li>See full diff in <a href="https://github.com/microsoft/onnxruntime-github-actions/compare/35f8bd42417991aa46577e9c32e445af4250f098...f3d90afe522476c858909e0de2be0b12bc890068">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update xcode and iphoneSimulatorVersion after MacOS-14 (#24260) ### Description Update xcode and iphoneSimulatorVersion after MacOS-14 ### Motivation and Context iOS packaging pipeline and Github Action were still using the old xcode version after https://github.com/microsoft/onnxruntime/pull/23293 * Exclude onnxruntime-inference-examples directory from Component Gover… (#24258) …nance ### Description Exclude onnxruntime-inference-examples directory from Component Governance ### Motivation and Context onnxruntime-inference-examples is a extneral repos * [VitisAI] Fixed include error. (#24199) ### Description <!-- Describe your changes. --> include mp11 as it is used for provider related headers ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> VitisAI failed to be built on latest g++ version. * Migrate pull:wasm to github action (#24269) ### Description The ADO Web CI is migrated to Github Actions now. This PR makes the corresponding changes to the `npm run pull:wasm` command to use the new Github Action. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Ensure to use correct GPU device in RunSince when it's invoked by new thread (#24192) Running cuda kernel on incorrect GPU device will end up getting CUDA error: `invalid resource handle.` CUDA EP and TRT EP both have this issue when ExecutionMode::ORT_PARALLEL is enabled. Repro code: ````python provider = [ [ ('TensorrtExecutionProvider', { 'device_id': 0, }), ], [ ('TensorrtExecutionProvider', { 'device_id': 1, }), ] ] class ThreadObj(): def __init__(self, model_path: str, iterations: int, idx: int): ... sess_opt = ort.SessionOptions() sess_opt.execution_mode = ort.ExecutionMode.ORT_PARALLEL self.inference_session = ort.InferenceSession(model_path, sess_opt, provider[idx % 2]) def warmup(self): self.inference_session.run(None, self.input) def run(self, thread_times, threads_complete): for iter in range(self.iterations): self.inference_session.run(None, self.input) def thread_target(obj, thread_times, threads_complete): obj.run(thread_times, threads_complete) ... iterations = 500 num_threads = 13 t_obj_list = [] thread_list = [] for tidx in range(num_threads): obj = ThreadObj(model_path, iterations, tidx) t_obj_list.append(obj) obj.warmup() for t_obj in t_obj_list: thread = threading.Thread(target=thread_target, daemon=True, args=(t_obj,thread_times,threads_complete,)) thread.start() thread_list.append(thread) ... ```` The reason is when the inference session is initialized, it can be bound to device > 0, whereas when running the inference, i.e. RunSince can be invoked by a new thread and new threads default to using device 0, then we will hit the error of using the incorrect GPU device. This PR provides a general fix for both CUDA EP and TRT EP to call cudaSetDeivce in RunSince. * Adding build-system to pyproject.toml (#24216) * [WebGPU EP] Implements ceil mode for Average Pool (#24270) Ceil mode is required for rtdetr model. The actual ceil mode calculation is already implemented in the PoolAttributes::ComputeOutputSize() method from pool_attributes.h under CPU EP. * Pin vcpkg version (#24284) ### Description Pin vcpkg version. Yesterday vcpkg-tool made a new release that broke all our Linux pipelines. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Support load TensorRT V3 plugin (#24211) ### Description TensorRT V3 plugin is not able to load in TensorRT EP. The change deprecates `getPluginCreatorList` with `getAllCreators` to load V1 and V3 plugin creators. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Support load TensorRT plugin. Reference: https://github.com/NVIDIA/TensorRT/blob/8c6d69ddec0b2feff12f55472dc5d55cb6861d53/python/src/infer/pyPlugin.cpp#L2971C1-L2995C6 * Expose TRT preview features as EP option (#24212) ### Description <!-- Describe your changes. --> Expose TRT preview features as EP option. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Add support to turn on TensorRT preview features. For example, > If the IPluginV3OneBuildV2 build capability is used, the plugin can also communicate to TensorRT that certain input-output pairs are aliased (share the same data buffer). TensorRT will query IPluginV3OneBuildV2::getAliasedInput to determine any such aliasing behavior. To use this feature, **PreviewFeature::kALIASED_PLUGIN_IO_10_03** must be enabled. --------- Co-authored-by: Vcpkg Builder <builder@vcpkg> * [webgpu] test_layer_normalization_3d_axis0_epsilon (#24276) This uses dummy override shapes to bypass the 'components' check. * [webgpu][dawn API optimization] reduce number of calls to wgpuDeviceHasFeature (#24281) ### Description This PR is one of a series of changes for optimization of Dawn API usage. Currently, the WebGPU EP has some suboptimal code paths that result in unnecessary Dawn API calls. Reducing the number of calls to those API will help improve the performance of the WebGPU EP, especially on WebAssembly. This PR optimizes the usage of `wgpuDeviceHasFeature`. * Bump next from 15.2.3 to 15.2.4 in /js/web/test/e2e/exports/testcases/nextjs-default (#24283) Bumps [next](https://github.com/vercel/next.js) from 15.2.3 to 15.2.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vercel/next.js/releases">next's releases</a>.</em></p> <blockquote> <h2>v15.2.4</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>Match subrequest handling for edge and node (<a href="https://redirect.github.com/vercel/next.js/issues/77474">#77474</a>)</li> <li>exclude images and static media from dev origin check (<a href="https://redirect.github.com/vercel/next.js/issues/77417">#77417</a>)</li> <li>ensure /__next middleware URLs are included in the origin check (<a href="https://redirect.github.com/vercel/next.js/issues/77416">#77416</a>)</li> <li>remove direct ip/port bypass in dev origin check (<a href="https://redirect.github.com/vercel/next.js/issues/77414">#77414</a>)</li> <li>switch development origin verification to be opt-in rather than opt-out (<a href="https://redirect.github.com/vercel/next.js/issues/77395">#77395</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="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/ijjk"><code>@ijjk</code></a> and <a href="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/ztanner"><code>@ztanner</code></a> for helping!</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/vercel/next.js/commit/804aa35c71cc65cf3ddc29cdadcd29f06b368285"><code>804aa35</code></a> v15.2.4</li> <li><a href="https://github.com/vercel/next.js/commit/ecb72ee9ead86aaa1e3992b427bfb43b046aa08d"><code>ecb72ee</code></a> Match subrequest handling for edge and node (<a href="https://redirect.github.com/vercel/next.js/issues/77474">#77474</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/25f810b596cdb6875d1f068ae8d203f1a5df7a46"><code>25f810b</code></a> exclude images and static media from dev origin check (<a href="https://redirect.github.com/vercel/next.js/issues/77417">#77417</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/d9bcb833dd2a8dd5c13f30775d688f7015cd75b1"><code>d9bcb83</code></a> ensure /__next middleware URLs are included in the origin check (<a href="https://redirect.github.com/vercel/next.js/issues/77416">#77416</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/cfeaa86fa718f1fecce9fb5f5fad3c310117fc53"><code>cfeaa86</code></a> remove direct ip/port bypass in dev origin check (<a href="https://redirect.github.com/vercel/next.js/issues/77414">#77414</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/f84730266087817b39c9b87c42ccf1c3bb7de0c5"><code>f847302</code></a> switch development origin verification to be opt-in rather than opt-out (<a href="https://redirect.github.com/vercel/next.js/issues/77395">#77395</a>)</li> <li>See full diff in <a href="https://github.com/vercel/next.js/compare/v15.2.3...v15.2.4">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump image-size from 1.1.1 to 1.2.1 in /js/react_native/e2e (#24278) Bumps [image-size](https://github.com/image-size/imag…
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Use 'desktop only' solution in GPU C# packaging builds. We don't need to include any MAUI support for those builds.
Motivation and Context