Skip to content

Conversation

jeromeku
Copy link
Collaborator

@jeromeku jeromeku commented May 27, 2025

Llama4 MoE Grouped GEMM

Add a reference Llama4TextMoe layer that uses the grouped gemm kernel from #2465.

Note that due to structural differences from Qwen3 and more typical MoE architectures, some of the fusions from the original implementation are not applicable.

Also, there are additional optimization opportunities around the router, token shuffling, and shared expert calculation to be added in future PRs.

Updates:

  • tests/test_llama4_moe.py - correctness tests for the Llama4 triton grouped gemm layer.
  • benchmark/benchmark_fused_moe.py - updated to include Llama4 in additional to Qwen3.
  • grouped_gemm/reference/layers/llama4_moe.py - implementation of HF Llama4TextMoe with triton grouped gemm kernel.

@danielhanchen danielhanchen merged commit 0ba1bda into unslothai:main May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants