[fix] trtllm-gen mla kernel warnings #4119

zhhuang-nv · 2025-05-07T10:15:39Z

[fix] trtllm-gen mla kernel warnings

Description

We have some warnings like these when running MLA on SM100:

[TensorRT-LLM][WARNING] Fall back to unfused MHA for dataType = bf16, dataTypeKv = bf16, dataTypeOut = bf16, forceFp32Acc = false, attentionMaskType = causal, attentionInputLayout = packed_qkv, isSPadded = false, numQHeads = 32, numKvHeads = 1, numTokensPerBlock = 32, headSize = 576, headSizeV = 576, qScaling = 1.000000, attnLogitSoftcappingScale = 0.000000, hasAlibi = false, scaleAlibi = false, tpSize = 1, tpRank = 0, sageBlockSizeQ = 0, sageBlockSizeK = 0, sageBlockSizeV = 0 in sm_100.
[TensorRT-LLM][WARNING] TRTLLM-GEN does not support the requested kernels.

The warnings are introduced by #3862. This PR aims to fix them by always modifying fmhaParams no matter mIsGenerationMLA is true or false.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

zhhuang-nv · 2025-05-07T10:17:22Z

/bot run

tensorrt-cicd · 2025-05-07T10:23:08Z

PR_Github #4368 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-07T12:08:39Z

PR_Github #4368 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3135 completed with status: 'FAILURE'

zhhuang-nv · 2025-05-07T12:26:15Z

/bot run

tensorrt-cicd · 2025-05-07T12:31:51Z

PR_Github #4380 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-07T13:43:41Z

PR_Github #4380 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3145 completed with status: 'FAILURE'

zhhuang-nv · 2025-05-07T14:05:48Z

/bot run

tensorrt-cicd · 2025-05-07T14:11:41Z

PR_Github #4393 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-07T15:39:16Z

PR_Github #4393 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3157 completed with status: 'FAILURE'

bobboli

This could solve the problem. But the better solution is that fmhaDispatcher should not be used when trtllmGen is enabled.

The code path of AttentionOp has been a bit messy (fmha/trtllm-gen/flashmla), especially with the introduction of MLA. Let's see if we can have an opportunity to refactor.

zhhuang-nv · 2025-05-08T05:52:21Z

/bot run

tensorrt-cicd · 2025-05-08T05:57:45Z

PR_Github #4488 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-08T08:41:12Z

PR_Github #4488 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3224 completed with status: 'FAILURE'

zhhuang-nv · 2025-05-08T08:43:44Z

/bot run

tensorrt-cicd · 2025-05-08T09:19:26Z

PR_Github #4528 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-08T13:16:04Z

PR_Github #4528 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3255 completed with status: 'FAILURE'

zhhuang-nv · 2025-05-09T02:18:47Z

/bot run

tensorrt-cicd · 2025-05-09T02:24:33Z

PR_Github #4623 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-09T04:39:53Z

PR_Github #4623 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3333 completed with status: 'FAILURE'

Signed-off-by: Zhen Huang <[email protected]>

zhhuang-nv · 2025-05-09T05:57:34Z

/bot run

tensorrt-cicd · 2025-05-09T06:03:02Z

PR_Github #4660 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-09T09:48:56Z

PR_Github #4660 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3361 completed with status: 'SUCCESS'

zhhuang-nv requested review from PerkzZheng and bobboli May 7, 2025 10:15

PerkzZheng approved these changes May 7, 2025

View reviewed changes

bobboli approved these changes May 7, 2025

View reviewed changes

zhhuang-nv force-pushed the fix-mla-kernel-warnings branch from 9aec810 to 4c799c5 Compare May 8, 2025 05:52

zhhuang-nv force-pushed the fix-mla-kernel-warnings branch from 4c799c5 to c6a7fe1 Compare May 8, 2025 08:43

zhhuang-nv force-pushed the fix-mla-kernel-warnings branch from c6a7fe1 to bc68d82 Compare May 9, 2025 02:18

fix trtllm-gen mla kernel warnings

db11a37

Signed-off-by: Zhen Huang <[email protected]>

zhhuang-nv force-pushed the fix-mla-kernel-warnings branch from bc68d82 to db11a37 Compare May 9, 2025 05:57

byshiue merged commit 0a36db0 into NVIDIA:main May 9, 2025
3 checks passed

zhhuang-nv deleted the fix-mla-kernel-warnings branch May 15, 2025 07:34

[fix] trtllm-gen mla kernel warnings #4119

[fix] trtllm-gen mla kernel warnings #4119

Uh oh!

Conversation

zhhuang-nv commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[fix] trtllm-gen mla kernel warnings

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

zhhuang-nv commented May 7, 2025

Uh oh!

tensorrt-cicd commented May 7, 2025

Uh oh!

tensorrt-cicd commented May 7, 2025

Uh oh!

zhhuang-nv commented May 7, 2025

Uh oh!

tensorrt-cicd commented May 7, 2025

Uh oh!

tensorrt-cicd commented May 7, 2025

Uh oh!

zhhuang-nv commented May 7, 2025

Uh oh!

tensorrt-cicd commented May 7, 2025

Uh oh!

tensorrt-cicd commented May 7, 2025

Uh oh!

bobboli left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhhuang-nv commented May 8, 2025

Uh oh!

tensorrt-cicd commented May 8, 2025

Uh oh!

tensorrt-cicd commented May 8, 2025

Uh oh!

zhhuang-nv commented May 8, 2025

Uh oh!

tensorrt-cicd commented May 8, 2025

Uh oh!

tensorrt-cicd commented May 8, 2025

Uh oh!

zhhuang-nv commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

zhhuang-nv commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

tensorrt-cicd commented May 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zhhuang-nv commented May 7, 2025 •

edited

Loading

bobboli left a comment •

edited

Loading