[nvidia] initial support for blackwell kernels #1039

yzh119 · 2025-04-24T20:02:53Z

Mainly adapted from cutlass examples.

johnnynunez · 2025-04-25T18:51:16Z

@yzh119 cutlass 3.9.0 is officially released https://github.com/NVIDIA/cutlass/releases/tag/v3.9.0

yzh119 · 2025-04-25T19:33:22Z

Hi @johnnynunez , yes and we upgraded 3rdparty dependency to cutlass 3.9 several weeks ago: #997.

This PR changes the original implementation to unblocks the features we need.

hwu36 · 2025-05-02T17:15:20Z

We will have 3.9.2 this weekend.

yzh119 · 2025-05-02T17:17:31Z

@hwu36 good to know, we will adopt to the 3.9.2 as soon as its ready.

Co-authored-by: Yaxing Cai <[email protected]>

cyx-6

Looks great!

yzh119 · 2025-05-13T03:59:46Z

Major changes made to cutlass original example:

add barrier_O and reuse shared memory for mainloop & epilogue to reduce shared memory size. (the current smem layout can't support head_dim_qk=192 and head_dim_vo=128).
separate pipeline_K/pipeline_V and smem_k/smem_v for head_dim_qk=192 and head_dim_vo=128
remove the need of padding for tma_load of q/k/v (we observe a significant overhead of padding), but still use padding trick in tma_store, the overhead is tolerable, because we can allocate a padded buffer without data movement.
the original persistent tile scheduler is bad for causal attention, this PR adds a naive tile scheduler (which is slightly better), we have a working-in-progress ahead-of-time static scheduler (like earlier flashinfer plan function) which reach better performance on causal attention, and will be upstreamed later.
change mask mode to inference-style causal mask.

zhyncs mentioned this pull request Apr 28, 2025

[Feature] integrate FlashInfer Blackwell kernels sgl-project/sglang#5855

Closed

2 tasks

mgoin mentioned this pull request Apr 28, 2025

[Feature]: Integrate FlashInfer Blackwell kernels vllm-project/vllm#17325

Closed

1 task

yzh119 and others added 11 commits May 4, 2025 11:43

port from cutlass

3236925

wip

3354ca6

Co-authored-by: Yaxing Cai <[email protected]>

add benchmark

fba254f

upd

23ff53c

update causal mask

676e2d2

fix hang

99e6338

wip

9e019be

unittest passed

ed27382

upd

1a4830e

remove uncessary preproc

a79f9d4

wrapper

eef0ada

yzh119 force-pushed the cutlass-fmha-blackwell branch from 5849003 to eef0ada Compare May 4, 2025 15:44

yzh119 added 5 commits May 7, 2025 18:50

upd

99d0e52

wip

a89ea06

upd

e706514

upd

07b9024

upd

bcbd178

zhyncs added the priority: high label May 9, 2025

yzh119 added 3 commits May 12, 2025 19:17

unblock 192 & 128 case for deepseek

9e69ccc

upd

a81fba8

upd

3ab1385

cyx-6 approved these changes May 13, 2025

View reviewed changes

yzh119 added 2 commits May 12, 2025 20:39

remove unused

929546e

fix interface

ab84a24

upd

f3855cf

yzh119 merged commit 9a05c92 into flashinfer-ai:main May 13, 2025
1 of 2 checks passed

zhyncs deleted the cutlass-fmha-blackwell branch May 13, 2025 09:21

NorthmanPKU mentioned this pull request Jun 6, 2025

[Feature] Support Flashinfer fmha on Blackwell sgl-project/sglang#6930

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[nvidia] initial support for blackwell kernels #1039

[nvidia] initial support for blackwell kernels #1039

Uh oh!

yzh119 commented Apr 24, 2025

Uh oh!

johnnynunez commented Apr 25, 2025 •

edited

Loading

Uh oh!

yzh119 commented Apr 25, 2025

Uh oh!

hwu36 commented May 2, 2025

Uh oh!

yzh119 commented May 2, 2025

Uh oh!

cyx-6 left a comment

Uh oh!

Uh oh!

yzh119 commented May 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

[nvidia] initial support for blackwell kernels #1039

[nvidia] initial support for blackwell kernels #1039

Uh oh!

Conversation

yzh119 commented Apr 24, 2025

Uh oh!

johnnynunez commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yzh119 commented Apr 25, 2025

Uh oh!

hwu36 commented May 2, 2025

Uh oh!

yzh119 commented May 2, 2025

Uh oh!

cyx-6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yzh119 commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

johnnynunez commented Apr 25, 2025 •

edited

Loading

yzh119 commented May 13, 2025 •

edited

Loading