Fix/deterministic dk dv #1678

yuWeiCute · 2025-05-26T12:14:10Z

Upon inspecting the dv_semaphore during debugging, it was found that some dv_semaphore values were not initialized to zero.The issue was resolved by changing torch::empty to torch::zeros, and the problem no longer occurred.
The semaphore is initialized with the shape [seq_len / kBlockN, batch_size, num_head_kv], but during accumulation, num_batch = 1, leading to a mismatch in the data dimensions.

fix issue：#1596

check test

defei-coder · 2025-05-27T02:24:04Z

@tridao
I'm glad FA3 adopted the semaphore solution to solve the problem of backward deterministic computing (which used for dq any case and dk & dv while GQA, which similar to #722).
Our team("Meituan MLP FastKernel team") has found FA3 developed the deterministic solution code, but deterministic flag was not enabled now.(opened issue:#1596)
We tried to fixed this by zero dk_accum & dv_accum out and fixe the num_batch value in CollectiveEpilogueBwdGQA.
Could you help to review our PR, and give us some feedbacks?

evanluyifan · 2025-05-29T08:05:50Z

Thiis can run correctly on our GQA case.
Plz help to review, and waiting for ur comments. @tridao

xTayEx · 2025-06-05T15:18:41Z

@yuWeiCute Hi, I have cloned this PR and given it a try. But I found that now in the test_flash_attn.py file, the call to flash_attn_3_cuda.bwd is commented, so actually the deterministic argument is not tested, see below.

            # import flash_attn_3_cuda
            # dq, dk, dv, softmax_d, dq_accum, dk_accum, dv_accum = flash_attn_3_cuda.bwd(
            #     g,
            #     q,
            #     k,
            #     v,
            #     out,
            #     lse,
            #     None,
            #     None,
            #     None,
            #     d ** (-0.5),
            #     causal,
            #     window_size[0], window_size[1],
            #     softcap,
            #     deterministic,
            #     0,  # sm_margin
            # )

https://github.com/yuWeiCute/flash-attention-hopper/blob/a9a3170fc98cbd22a4cc870937b390f3d483f1eb/hopper/test_flash_attn.py#L228-L245
Could you please provide a minimal example to try this PR?

xTayEx · 2025-06-09T06:20:35Z

@tridao We here at Tencent tried this approach, proved to be deteministic & efficient at our cases. Could you please share when this PR might be merged? We are looking forward to the official GQA bwd deterministic version!

lygztq · 2025-07-03T15:26:08Z

Well done! looking forward to this new approach for deterministic bwd

chrisHuxi · 2025-07-04T13:34:23Z

@tridao hi, we from ByteDance has identified this as a valuable feature for future releases. Do we have an estimated timeline for when it might be merged? Thx.

tridao · 2025-07-04T14:27:40Z

Cool will review & merge this weekend

…test script

yuWeiCute · 2025-07-05T13:59:36Z

Hi @tridao

Thanks for your support. I really appreciate your effort to review this code change.

I noticed deterministic mode still isn't supported in some cases, particularly when head dimension equals 256. To fix this, I've added a new commit with:

Adds validation checks for unsupported cases
Adds additional test cases

Let me know if you have any feedback on these updates.

defei-coder · 2025-07-17T06:57:49Z

@tridao Hi, tri. Any suggestions for this PR?

evanluyifan · 2025-08-16T10:48:28Z

Cool will review & merge this weekend

And updates?

jiangyuwei03 added 3 commits May 23, 2025 19:51

fix: resolve deterministic issue in dk dv

9163c94

check test

Merge remote-tracking branch 'target/main' into fix/deterministic-dk-dv

ea6443c

Remove GQA deterministic assert restriction

164b798

feat: Add TORCH_CHECK for unsupported deterministic mode and related …

78ab9e8

…test script

yuWeiCute force-pushed the fix/deterministic-dk-dv branch from 50c1d49 to 78ab9e8 Compare July 5, 2025 13:34

fix TORCH_CHECK message

111d9d8

lisheng-spaghetti mentioned this pull request Aug 15, 2025

Enable the deterministic mode option in the backward kernel #1766

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/deterministic dk dv #1678

Fix/deterministic dk dv #1678

yuWeiCute commented May 26, 2025 •

edited

Loading

Uh oh!

defei-coder commented May 27, 2025

Uh oh!

evanluyifan commented May 29, 2025

Uh oh!

xTayEx commented Jun 5, 2025 •

edited

Loading

Uh oh!

xTayEx commented Jun 9, 2025

Uh oh!

lygztq commented Jul 3, 2025

Uh oh!

chrisHuxi commented Jul 4, 2025

Uh oh!

tridao commented Jul 4, 2025

Uh oh!

yuWeiCute commented Jul 5, 2025

Uh oh!

defei-coder commented Jul 17, 2025

Uh oh!

evanluyifan commented Aug 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Fix/deterministic dk dv #1678

Are you sure you want to change the base?

Fix/deterministic dk dv #1678

Conversation

yuWeiCute commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

defei-coder commented May 27, 2025

Uh oh!

evanluyifan commented May 29, 2025

Uh oh!

xTayEx commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xTayEx commented Jun 9, 2025

Uh oh!

lygztq commented Jul 3, 2025

Uh oh!

chrisHuxi commented Jul 4, 2025

Uh oh!

tridao commented Jul 4, 2025

Uh oh!

yuWeiCute commented Jul 5, 2025

Uh oh!

defei-coder commented Jul 17, 2025

Uh oh!

evanluyifan commented Aug 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

yuWeiCute commented May 26, 2025 •

edited

Loading

xTayEx commented Jun 5, 2025 •

edited

Loading