Llama4 MoE Grouped GEMM #2639

jeromeku · 2025-05-27T22:51:59Z

Llama4 MoE Grouped GEMM

Add a reference Llama4TextMoe layer that uses the grouped gemm kernel from #2465.

Note that due to structural differences from Qwen3 and more typical MoE architectures, some of the fusions from the original implementation are not applicable.

Also, there are additional optimization opportunities around the router, token shuffling, and shared expert calculation to be added in future PRs.

Updates:

tests/test_llama4_moe.py - correctness tests for the Llama4 triton grouped gemm layer.
benchmark/benchmark_fused_moe.py - updated to include Llama4 in additional to Qwen3.
grouped_gemm/reference/layers/llama4_moe.py - implementation of HF Llama4TextMoe with triton grouped gemm kernel.

jeromeku added 3 commits May 27, 2025 15:25

add llama4 reference layer

aab6bf7

add llama4 reference impl

50c0c48

formatting

5aafd3e

danielhanchen merged commit 0ba1bda into unslothai:main May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Llama4 MoE Grouped GEMM #2639

Llama4 MoE Grouped GEMM #2639

Uh oh!

jeromeku commented May 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Llama4 MoE Grouped GEMM #2639

Llama4 MoE Grouped GEMM #2639

Uh oh!

Conversation

jeromeku commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!