[feat] deterministic backward atomicAdd version #722

defei-coder · 2023-12-13T03:21:41Z

FA2 Supports deterministic Computation Feature, fix this issue #429

Add one deterministic flag like FA1 to control whether the results are deterministic.
This feature needs extra worksapce for semaphores.
For convenience, users are allowed not to allocate extra workspace, it will be allocated automatically and throws one warning message.
We do not support deterministic=True and local=True at the same time, will workaround this limitation later.

Example(for usage and test):

    workspace = torch.zeros(flash_get_bwd_workspace_size_func(batch_size, nheads, seqlen_q, d), device=device, dtype=torch.int32)
    q = torch.randn(batch_size, seqlen_q, nheads, d, device=device, dtype=dtype, requires_grad=True)
    k = torch.randn(batch_size, seqlen_k, nheads, d, device=device, dtype=dtype, requires_grad=True)
    v = torch.randn(batch_size, seqlen_k, nheads, d, device=device, dtype=dtype, requires_grad=True)
    out = flash_attn_func(q, k, v, 0.0, causal=causal, window_size=window_size, workspace=workspace, deterministic=True)
    g = torch.randn_like(out)

    (
        dq,
        dk,
        dv,
     ) = torch.autograd.grad(out, (q, k, v), g, retain_graph=True)

    (
        dq1,
        dk1,
        dv1,
    ) = torch.autograd.grad(out, (q, k, v), g)
assert (dq - dq1).abs().max().item() == 0

By this way, result of dq is deterministic. We run twice backward, get dq and dq1, The results of these two are completely identical.

evanluyifan · 2023-12-13T04:01:10Z

Since extra workspace could be reused during runtime, to avoid redundant GPU driver alloc/free at high frequence.
We may need to expose the python api for alloc extra workspace, let the framework allocator to handle gpu mem reuse.

hwu36 · 2023-12-13T18:05:31Z

https://research.colfax-intl.com/nvidia-hopper-flashattention-2/ replaced ampere mma.sync and cp.async with hopper tma and wgmma on hopper. It needs to use smaller tile size to prevent reg spilling and use warp specialization to hide more data movement.

@jayhshah

evanluyifan · 2023-12-21T07:46:14Z

Hi, @tridao this PR has been opened for a while, could you help to do the code review?
BTW, this change has been tested in cases from Meituan, the bwd outputs are deterministic.

tridao · 2023-12-21T08:00:10Z

Thanks, just got back from some travel, let me review it this week.

tridao · 2023-12-24T05:06:25Z

Thanks, I've incorporated some of the idea here with a slightly different approach, and there's an option for deterministic bwd as of v2.4.1. I've acknowledged your contribution in the README.

“budefei” added 2 commits December 13, 2023 11:09

[feat] deterministic backward atomicAdd version

ce743c1

commenting out redundant code

d9b53b8

“budefei” added 3 commits December 13, 2023 21:45

add get_workspace_size python api

7de4d0b

workspace is None, malloc a zero tensor

4f331a3

Add python api for get_workspace_size and modify code format

d2c1784

fix m_block is different in different batch while varlen

04b5aca

tridao closed this Dec 24, 2023

defei-coder mentioned this pull request Jan 2, 2024

Comparison of different solutions for deterministic backward #747

Closed

defei-coder mentioned this pull request May 27, 2025

Fix/deterministic dk dv #1678

Open

Ju-si-yuan mentioned this pull request Aug 7, 2025

flash-attn cannot perform deterministic computation huggingface/transformers#39982

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[feat] deterministic backward atomicAdd version #722

[feat] deterministic backward atomicAdd version #722

Uh oh!

defei-coder commented Dec 13, 2023 •

edited

Loading

Uh oh!

evanluyifan commented Dec 13, 2023

Uh oh!

hwu36 commented Dec 13, 2023

Uh oh!

evanluyifan commented Dec 21, 2023

Uh oh!

tridao commented Dec 21, 2023

Uh oh!

tridao commented Dec 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[feat] deterministic backward atomicAdd version #722

[feat] deterministic backward atomicAdd version #722

Uh oh!

Conversation

defei-coder commented Dec 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evanluyifan commented Dec 13, 2023

Uh oh!

hwu36 commented Dec 13, 2023

Uh oh!

evanluyifan commented Dec 21, 2023

Uh oh!

tridao commented Dec 21, 2023

Uh oh!

tridao commented Dec 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

defei-coder commented Dec 13, 2023 •

edited

Loading