Skip to content

Conversation

zhuyuhua-v
Copy link

@zhuyuhua-v zhuyuhua-v commented Sep 8, 2025

rocm_aiter_gemm_w8a8_blockscale supports multiple dispatch methods. An env flag VLLM_USE_AITER_TRITON_GEMM is added to control whether to use Triton GEMM or CK GEMM for this kernel.

Default (VLLM_USE_AITER_TRITON_GEMM=false): use CK GEMM path with wfp8afp8.
Enabled (VLLM_USE_AITER_TRITON_GEMM=true or 1): use triton wfp8afp8 gemm.

perf and acc comparison:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant