[CK_TILE] FMHA BWD Optimizations for D48 for GFX950 #1180

DDEle · 2025-10-13T07:16:56Z

Motivation

To optimize HDim=48 cases.

Technical Details

Update to ROCm/composable_kernel@95bdc74
Update to ROCm/composable_kernel@013ba3c

Test Plan

MAX_JOBS=$(nproc) pytest op_tests/test_mha.py -v

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull Request Overview

This PR updates the composable_kernel submodule to implement FMHA (Fused Multi-Head Attention) backward pass optimizations specifically for D48 configurations on GFX950 hardware.

Updates composable_kernel submodule commit to include FMHA BWD optimizations
Targets D48 dimension size optimizations for GFX950 GPU architecture
Focuses on backward pass performance improvements for attention mechanisms

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

DDEle · 2025-10-16T03:23:55Z

It seems that test_gemm_a8w8_blockscale_mi350 fails (coredump) with a high probability with this CK update, while there is a small probability of coredump in current aiter main branch (with linked CK version).

Another pattern of this failure is that the problem only appears in the first run. The test_gemm_a8w8_blockscale_mi350 runs smoothly in following runs (where jit cache exists).

valarLip · 2025-10-17T12:58:04Z

Test failed: op_tests/test_mha.py ?

Copilot AI review requested due to automatic review settings October 13, 2025 07:16

Copilot AI reviewed Oct 13, 2025

View reviewed changes

DDEle requested a review from slippedJim October 13, 2025 09:18

DDEle added 2 commits October 16, 2025 05:55

Update CK to ROCm/composable_kernel@013ba3c

e891f62

Disable test_gemm_a8w8_blockscale_mi350

a603b28

DDEle force-pushed the ck-fmha-bwd-d48 branch from 3e5dcdc to a603b28 Compare October 16, 2025 05:59

valarLip self-assigned this Oct 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK_TILE] FMHA BWD Optimizations for D48 for GFX950 #1180

[CK_TILE] FMHA BWD Optimizations for D48 for GFX950 #1180

Uh oh!

DDEle commented Oct 13, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

DDEle commented Oct 16, 2025

Uh oh!

valarLip commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CK_TILE] FMHA BWD Optimizations for D48 for GFX950 #1180

Are you sure you want to change the base?

[CK_TILE] FMHA BWD Optimizations for D48 for GFX950 #1180

Uh oh!

Conversation

DDEle commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

DDEle commented Oct 16, 2025

Uh oh!

valarLip commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DDEle commented Oct 13, 2025 •

edited

Loading