[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe #2

minosfuture · 2025-06-25T07:16:12Z

Purpose

vllm-project#19667 changed the workspace creation from torch.zeros to torch.empty. This ends up causing correctness for models using cutlass_moe, e.g. Maverick in our test case. This PR fixes the correctness issue by explicitly filling zeros in cutlass_moe.

Test Plan

lm_eval, ut

Test Result

lm_eval results:

local-chat-completions (model=meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8,base_url=http://127.0.0.1:8081/v1/chat/completions,num_concurrent=32), gen_kwargs: (None), limit: 200.0, num_fewshot: 5, batch_size: 1

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.935	±	0.0175
		strict-match	5	exact_match	↑	0.920	±	0.0192

unit test stability verified:

without c1.fill_(0), the following one liner verifies stable failure:

for i in {1..10}; do echo $i; pytest -s tests/kernels/moe/test_cutlass_moe.py  -k "test_run_cutlass_moe_fp8 or test_cutlass_moe_8_bit_EP_large" -v  2>&1 > /dev/null && { echo "shouldn't succeed"; exit 1; } done`

with c1.fill_(0), the following verifies stable success:

for i in {1..10}; do echo $i; pytest -s tests/kernels/moe/test_cutlass_moe.py  -k "test_run_cutlass_moe_fp8 or test_cutlass_moe_8_bit_EP_large" -v  2>&1 > /dev/null || { echo "should succeed"; exit 1; } done

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

yeqcharlotte · 2025-06-25T18:05:19Z

vllm/model_executor/layers/fused_moe/cutlass_moe.py

@@ -176,6 +176,7 @@ def run_cutlass_moe_fp8(
        c1 = _resize_cache(workspace13, (M * topk, N * 2))
        c2 = _resize_cache(workspace2, (M * topk, N))
        c3 = _resize_cache(workspace13, (M * topk, K))
+        c1.fill_(0)


great! any way to capture this in test_cutlass_moe?

yep, added a couple unit tests

Signed-off-by: Ming Yang <[email protected]>

ElizaWszola · 2025-07-01T15:04:25Z

vllm/model_executor/layers/fused_moe/cutlass_moe.py

+    if expert_map is not None:
+        c1.fill_(0)


One more tiny thing: can you check if we need to do this if per_act_token is true?

no we don't. I figured out the root cause is that the random data in the unused space in c1 caused scale (over the whole c1) to be larger, resulting in precision loss for the actual data. So if we use per_act_token==True, scales won't be impacted. Let me update the PR in vllm-project.
I'll close this PR to avoid confusion -- this was a experimental PR for early review.

minosfuture · 2025-07-01T16:32:53Z

move to vllm-project#20167. closing.

yeqcharlotte reviewed Jun 25, 2025

View reviewed changes

[Bugfix] Fix Maverick correctness by filling zero to cache space

890ed4f

Signed-off-by: Ming Yang <[email protected]>

minosfuture force-pushed the fix_maverick_correctness branch from 52be3eb to 66c457b Compare June 27, 2025 05:07

Add unit test case that would fail without filling zeros to c1

25d3af8

Signed-off-by: Ming Yang <[email protected]>

minosfuture force-pushed the fix_maverick_correctness branch from 66c457b to 25d3af8 Compare June 27, 2025 06:07

minosfuture added 2 commits June 27, 2025 15:27

Address comments: func extraction; check expert_map; larger m

73e55f2

Signed-off-by: Ming Yang <[email protected]>

Address comment: fix batched code path as well

7476168

Signed-off-by: Ming Yang <[email protected]>

ElizaWszola reviewed Jul 1, 2025

View reviewed changes

minosfuture closed this Jul 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe #2

[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe #2

Uh oh!

minosfuture commented Jun 25, 2025 •

edited

Loading

Uh oh!

yeqcharlotte Jun 25, 2025

Uh oh!

minosfuture Jun 27, 2025

Uh oh!

ElizaWszola Jul 1, 2025 •

edited

Loading

Uh oh!

minosfuture Jul 1, 2025

Uh oh!

minosfuture commented Jul 1, 2025

Uh oh!

Uh oh!

[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe #2

[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe #2

Uh oh!

Conversation

minosfuture commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

yeqcharlotte Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

minosfuture Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

ElizaWszola Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

minosfuture Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

minosfuture commented Jul 1, 2025

Uh oh!

Uh oh!

minosfuture commented Jun 25, 2025 •

edited

Loading

ElizaWszola Jul 1, 2025 •

edited

Loading