Skip to content

[Feature] Integrate SM100 DeepGEMM support #20087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Jul 11, 2025
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
5bd848f
integrate new deepgemm
yewentao256 Jun 18, 2025
c78b3ca
update unit test
yewentao256 Jun 18, 2025
3ec1674
update benchmark for dense gemm
yewentao256 Jun 18, 2025
e24cd09
add bench moe
yewentao256 Jun 18, 2025
a6541d1
add comments back
yewentao256 Jun 19, 2025
0a2005a
add deprecate
yewentao256 Jun 19, 2025
de687b1
Merge branch 'vllm-project:main' into wye-integrate-new-deepgemm
yewentao256 Jun 20, 2025
eb9611f
Merge branch 'vllm-project:main' into wye-integrate-new-deepgemm
yewentao256 Jun 21, 2025
bd52201
Merge branch 'vllm-project:main' into wye-integrate-new-deepgemm
yewentao256 Jun 23, 2025
78a10ec
Merge remote-tracking branch 'origin/main' into wye-integrate-new-dee…
yewentao256 Jun 25, 2025
a6c8b9c
Merge branch 'main' into wye-integrate-new-deepgemm
yewentao256 Jul 2, 2025
fb959c5
Merge remote-tracking branch 'origin/main' into wye-integrate-new-dee…
yewentao256 Jul 2, 2025
9fdf2c8
update to global diff
yewentao256 Jul 2, 2025
43dc3d8
wrapper for deepgemm
yewentao256 Jul 3, 2025
4bba04b
per block fp8
yewentao256 Jul 3, 2025
d9fd43e
update
yewentao256 Jul 3, 2025
b2a740b
add is new deep geem api
yewentao256 Jul 3, 2025
07c1ad5
integration of weights
yewentao256 Jul 8, 2025
3e7f4ee
remove usage of group broadcast
yewentao256 Jul 8, 2025
608e645
all branches goes to the deepgemm on b200
yewentao256 Jul 8, 2025
a41095a
Merge remote-tracking branch 'origin/main' into wye-integrate-new-dee…
yewentao256 Jul 8, 2025
5fb1949
add num dispatchers
yewentao256 Jul 8, 2025
488ae84
fix fallback to triton
yewentao256 Jul 9, 2025
92193c3
rename to is_blackwell_deep_gemm
yewentao256 Jul 9, 2025
92267fd
rename function
yewentao256 Jul 10, 2025
dd7d829
revert benchmark
yewentao256 Jul 10, 2025
e061ac5
fix acc issue
yewentao256 Jul 10, 2025
253d5d7
mark for skip
yewentao256 Jul 10, 2025
5d595ab
remove MOE DeepGemm
yewentao256 Jul 10, 2025
0092359
Merge branch 'main' into wye-integrate-new-deepgemm
yewentao256 Jul 10, 2025
23fc1c4
assert block size == 128
yewentao256 Jul 10, 2025
7ae38e3
remove unnessary patch
yewentao256 Jul 10, 2025
375a150
add back get_col_major_tma_aligned_tensor
yewentao256 Jul 10, 2025
6f7cacf
add more comments
yewentao256 Jul 10, 2025
dd4115d
fix interface on H100
yewentao256 Jul 10, 2025
f1b4602
Merge branch 'main' into wye-integrate-new-deepgemm
yewentao256 Jul 10, 2025
5875ae6
commit issue fixed
yewentao256 Jul 10, 2025
d9fcdb3
delete b200 benchmark for another pr
yewentao256 Jul 10, 2025
400232b
add device check
yewentao256 Jul 10, 2025
8e0548f
add is_blackwell_deep_gemm_used check
yewentao256 Jul 10, 2025
beaa23d
add comments for calc_diff
yewentao256 Jul 10, 2025
a1fc38d
Merge branch 'main' into wye-integrate-new-deepgemm
yewentao256 Jul 10, 2025
f490831
Merge branch 'main' into wye-integrate-new-deepgemm
yewentao256 Jul 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions benchmarks/kernels/benchmark_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,9 @@ def benchmark_config(
(num_experts, 2 * shard_intermediate_size), dtype=torch.float32
)
w2_scale = torch.randn((hidden_size, num_experts), dtype=torch.float32)
if use_deep_gemm:
# we use the default block shape for deepgemm
block_quant_shape = [128, 128]
if use_fp8_w8a8:
if block_quant_shape:
block_n, block_k = block_quant_shape[0], block_quant_shape[1]
Expand Down
Loading
Loading