feat: cutlass fp8 gemm bringup for SM120 & SM121 #1610

yongwww · 2025-08-30T07:47:07Z

📌 Description

It depends on #1608, mainly the cutlass fp8 gemm support for sm120/121, will rebase after #1608 lands.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

nvmbreughe · 2025-09-02T21:01:58Z

flashinfer/gemm.py

+                self,
+                inputs: List[torch.Tensor],
+                tactic: int = -1,
+                do_preparation: bool = False,


seems like this parameter is unused?

Good catch. They’re part of TunableRunner, keeping them for consistency with the others.

aleozlx · 2025-09-03T16:05:19Z

looks good no further comments from me

csrc/gemm_groupwise_sm120.cu

yzh119 · 2025-09-03T17:47:16Z

csrc/gemm_groupwise_sm120.cu

+    constexpr int SCALE_GRANULARITY_M = 1;   /* Always 1 for SM120 */                             \
+    constexpr int SCALE_GRANULARITY_K = 128; /* Always 128 for SM120 per CUTLASS requirement */   \
+    if (scale_granularity_m != 1) {                                                               \
+      TORCH_CHECK(false, "SM120 only supports scale_granularity_m=1");                            \


Didn't see this constraint in https://github.com/NVIDIA/cutlass/blob/b2dd65dc864e09688245b316ac46c4a6cd07e15c/include/cutlass/gemm/collective/sm120_mma_tma_blockwise_scaling.hpp, what's the error message if you set it to 128?

will run into static assertion failed with "Scale Granularity M must evenly divide the tile shape M."

This error should only happend when ScaleGranularityM == 128 and TileShapeM = 64:

https://github.com/NVIDIA/cutlass/blob/b2dd65dc864e09688245b316ac46c4a6cd07e15c/include/cutlass/gemm/collective/sm120_mma_tma_blockwise_scaling.hpp#L134

which is the case when MmaSM == 1 here: https://github.com/flashinfer-ai/flashinfer/pull/1610/files#diff-0977093a8d2429e66dab4cc40f31563717098cb5aca4354a814e4208f58f068bR78, which you disabled in https://github.com/flashinfer-ai/flashinfer/pull/1610/files#diff-68929275a79ec730031c1d5bec894f35ba6e932a08841fdae63087a6937c0f4fR70

right. I used a standalone test (not in this pr) to trigger that error message. will go with the #1610 (comment)

yzh119 · 2025-09-04T00:18:11Z

There might be some misunderstanding of MmaSM here: we use it in sm100 gemm because sm100 supports tcgen05 and 2-cta mode (where 2 ctas cooperatively perform a mma computation).

However, sm120 do not have tcgen05 and 2-cta mma, so MmaSM doesn't make sense here, it should always be 1 in sm120.

The condition used separate to the two cases in https://github.com/flashinfer-ai/flashinfer/pull/1610/files#diff-0977093a8d2429e66dab4cc40f31563717098cb5aca4354a814e4208f58f068bR78 should not be MmaSM == 1, but Cooperative/PingPong schedule: https://github.com/NVIDIA/cutlass/blob/b2dd65dc864e09688245b316ac46c4a6cd07e15c/examples/87_blackwell_geforce_gemm_blockwise/87b_blackwell_geforce_fp8_bf16_gemm_groupwise.cu#L120-L123.

Please consider the following changes:

remove the confusing MmaSM argument, there is no concept of 2SM in sm120.
if we want to support both PingPong and Cooperative GEMM, please refer to https://github.com/NVIDIA/cutlass/blob/b2dd65dc864e09688245b316ac46c4a6cd07e15c/examples/87_blackwell_geforce_gemm_blockwise/87b_blackwell_geforce_fp8_bf16_gemm_groupwise.cu#L166-L171

yongwww · 2025-09-04T02:29:51Z

Thanks, @yzh119, @nvmbreughe , @aleozlx for the helpful and insightful comments! I’ve incorporated them. Please take a look. For the PingPong gemm, I left it as a todo for now; the current default in the cutlass examples is cooperative gemm.

yzh119 · 2025-09-04T05:30:09Z

csrc/gemm_groupwise_sm120.cu

+    constexpr int SCALE_GRANULARITY_K = 128; /* equal tile K dimension*/                           \
+    if (scale_granularity_m != 1) {                                                                \
+      TORCH_CHECK(false,                                                                           \
+                  "SM120 only supports scale_granularity_m=1 to ensure compatibility with all "    \


Is this still the case after your changes? If not, let's add 128 back.

Good catch! A divisor of 128 should be valid. I added scale_granularity_m=1 and scale_granularity_m=128, the change was: https://github.com/flashinfer-ai/flashinfer/pull/1610/files#diff-68929275a79ec730031c1d5bec894f35ba6e932a08841fdae63087a6937c0f4fR44-R52

nvmbreughe reviewed Sep 2, 2025

View reviewed changes

yongwww force-pushed the sm120_cutlass_fp8_gemm branch from 296583e to c172dfd Compare September 3, 2025 00:17

yongwww marked this pull request as ready for review September 3, 2025 00:29

yzh119 reviewed Sep 3, 2025

View reviewed changes

yongwww force-pushed the sm120_cutlass_fp8_gemm branch from 98fdd75 to 3a5cd77 Compare September 4, 2025 02:19

yzh119 reviewed Sep 4, 2025

View reviewed changes

yongwww added 6 commits September 4, 2025 14:15

feat: cutlass fp8 gemm bringup for SM120 & SM121

97c8694

upd

a0bdccb

lint

4d07a26

upd

a1b315d

lint

7512de4

upd

5f40472

yongwww force-pushed the sm120_cutlass_fp8_gemm branch from b0ed858 to 5f40472 Compare September 4, 2025 16:17

yongwww added 4 commits September 4, 2025 09:36

lint

e95ea6c

upd

2ab0549

fix conflicts

305be25

mend

67a4d3e

yongwww force-pushed the sm120_cutlass_fp8_gemm branch from 3c22b87 to 305be25 Compare September 4, 2025 17:04

yzh119 approved these changes Sep 4, 2025

View reviewed changes

yongwww merged commit 90abf04 into flashinfer-ai:main Sep 4, 2025
2 checks passed

yongwww deleted the sm120_cutlass_fp8_gemm branch September 4, 2025 20:47

voipmonitor mentioned this pull request Sep 7, 2025

CUTLASS fp8 blockwise gemm support of sm120 sgl-project/sglang#9969

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: cutlass fp8 gemm bringup for SM120 & SM121 #1610

feat: cutlass fp8 gemm bringup for SM120 & SM121 #1610

Uh oh!

yongwww commented Aug 30, 2025 •

edited

Loading

Uh oh!

nvmbreughe Sep 2, 2025

Uh oh!

yongwww Sep 3, 2025

Uh oh!

aleozlx commented Sep 3, 2025

Uh oh!

Uh oh!

yzh119 Sep 3, 2025 •

edited

Loading

Uh oh!

yongwww Sep 3, 2025

Uh oh!

yzh119 Sep 4, 2025 •

edited

Loading

Uh oh!

yongwww Sep 4, 2025 •

edited

Loading

Uh oh!

yzh119 commented Sep 4, 2025 •

edited

Loading

Uh oh!

yongwww commented Sep 4, 2025

Uh oh!

yzh119 Sep 4, 2025 •

edited

Loading

Uh oh!

yongwww Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!

feat: cutlass fp8 gemm bringup for SM120 & SM121 #1610

feat: cutlass fp8 gemm bringup for SM120 & SM121 #1610

Uh oh!

Conversation

yongwww commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

nvmbreughe Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

yongwww Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

aleozlx commented Sep 3, 2025

Uh oh!

Uh oh!

yzh119 Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yongwww Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

yzh119 Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yongwww Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yzh119 commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yongwww commented Sep 4, 2025

Uh oh!

yzh119 Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yongwww Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yongwww commented Aug 30, 2025 •

edited

Loading

yzh119 Sep 3, 2025 •

edited

Loading

yzh119 Sep 4, 2025 •

edited

Loading

yongwww Sep 4, 2025 •

edited

Loading

yzh119 commented Sep 4, 2025 •

edited

Loading

yzh119 Sep 4, 2025 •

edited

Loading