[webgpu] Apply Flash Attention if sliding window exceeds KV cache length #25594

daijh · 2025-07-30T07:23:10Z

Description

#25372 adds sliding window support for Group Query Attention, disabling Flash Attention as it's not yet supported.

This PR adds a check for the sliding window and applies Flash Attention when the window size exceeds the KV cache length or total sequence length.

Motivation and Context

See above.

daijh · 2025-07-30T07:24:27Z

@guschmue @fs-eire @qjia7

onnxruntime/contrib_ops/webgpu/bert/group_query_attention.cc

onnxruntime/contrib_ops/webgpu/bert/flash_attention.cc

qjia7

LGTM, thanks.

guschmue · 2025-07-31T23:14:30Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-07-31T23:14:50Z

Azure Pipelines successfully started running 5 pipeline(s).

daijh · 2025-08-01T00:19:09Z

CI infra issue by check the logs, please help to re-run.

…gth (#25594) ### Description  #25372 adds sliding window support for Group Query Attention, disabling Flash Attention as it's not yet supported. This PR adds a check for the sliding window and applies Flash Attention when the window size exceeds the KV cache length or total sequence length. ### Motivation and Context See above.

This PR cherry-picks some pipeline changes from the main branch to the 1.23.0 release branch. - **[build] disable CodeQL for NPM Packaging Pipeline (#25614)** - **Refactor Java Test Pipeline (#25608)** - **[build] upgrade Node.js for NPM packaging pipeline (#25568)** And a WebGPU change: - **[webgpu] Apply Flash Attention if sliding window exceeds KV cache length (#25594)**

…gth (microsoft#25594) ### Description  microsoft#25372 adds sliding window support for Group Query Attention, disabling Flash Attention as it's not yet supported. This PR adds a check for the sliding window and applies Flash Attention when the window size exceeds the KV cache length or total sequence length. ### Motivation and Context See above.

Apply flash attention if sliding window exceeds KV cache length

7a82784

daijh added 2 commits July 30, 2025 15:38

Fix typo

ade51b0

Check sequence length

5f00414

qjia7 reviewed Jul 30, 2025

View reviewed changes

onnxruntime/contrib_ops/webgpu/bert/group_query_attention.cc Outdated Show resolved Hide resolved

guschmue added the ep:WebGPU ort-web webgpu provider label Jul 30, 2025

qjia7 reviewed Jul 31, 2025

View reviewed changes

onnxruntime/contrib_ops/webgpu/bert/flash_attention.cc Outdated Show resolved Hide resolved

Resolve comments

52ce1a1

qjia7 previously approved these changes Jul 31, 2025

View reviewed changes

daijh dismissed qjia7’s stale review via 4153d11 July 31, 2025 03:28

Minor update comment

4153d11

guschmue approved these changes Jul 31, 2025

View reviewed changes

guschmue merged commit 7cc93cf into microsoft:main Aug 1, 2025
87 of 91 checks passed

snnn mentioned this pull request Aug 1, 2025

Cherry-picks for ORT 1.23.0 #25620

Merged

daijh deleted the supports-sliding-window-for-flash-attention branch August 2, 2025 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[webgpu] Apply Flash Attention if sliding window exceeds KV cache length #25594

[webgpu] Apply Flash Attention if sliding window exceeds KV cache length #25594

Uh oh!

daijh commented Jul 30, 2025 •

edited

Loading

Uh oh!

daijh commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

qjia7 left a comment

Uh oh!

guschmue commented Jul 31, 2025

Uh oh!

azure-pipelines bot commented Jul 31, 2025

Uh oh!

daijh commented Aug 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[webgpu] Apply Flash Attention if sliding window exceeds KV cache length #25594

[webgpu] Apply Flash Attention if sliding window exceeds KV cache length #25594

Uh oh!

Conversation

daijh commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

daijh commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

qjia7 left a comment

Choose a reason for hiding this comment

Uh oh!

guschmue commented Jul 31, 2025

Uh oh!

azure-pipelines bot commented Jul 31, 2025

Uh oh!

daijh commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daijh commented Jul 30, 2025 •

edited

Loading

daijh commented Aug 1, 2025 •

edited

Loading