Dist_Muon optimizer support #1813

BoxiangW · 2025-09-18T20:57:45Z

Features in this PR

Supports local muon
Supports distributed muon
Tested on Dense and MoE models

Need https://github.com/NVIDIA-NeMo/Emerging-Optimizers

Note: In this PR, there is a example script that uses a custom model (llama3 8b with 8 expert), please change the model structure if needed.

To use dist_muon or muon optimizer, please run

# on compute node
cd <workspace>
git clone https://github.com/NVIDIA-NeMo/Emerging-Optimizers.git
git clone https://github.com/BoxiangW/Megatron-LM.git
cd Megatron-LM/
git checkout boxiangw/llm-shower-repo
cd ..

bash Megatron-LM/muon.sh

To use example script, please

Add your wandb api key and entity to the script (optional)
Add your tokenizer model path and data path to the script
Select your optimizer from [muon, dist_muon, adam], defaults to dist_muon

Signed-off-by: Boxiang Wang <[email protected]>

copy-pr-bot · 2025-09-18T20:57:48Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Boxiang Wang <[email protected]>

BoxiangW · 2025-10-11T03:29:34Z

Closing this PR in favor of https://github.com/NVIDIA/Megatron-LM/tree/dev branch.

valentyn1boreiko · 2025-10-15T17:14:02Z

Hi @BoxiangW, thank you for the implementation! Going through the code, it doesn't really support distributed Muon, does it? Also the distrib_optimizer.py is still mostly for Adam and is not touched in this PR.

BoxiangW · 2025-10-22T05:57:28Z

Hi, we have merged this in the new dev branch and added TP support as well https://github.com/NVIDIA/Megatron-LM/blob/dev/megatron/core/optimizer/layer_wise_optimizer.py. TP: https://github.com/NVIDIA/Megatron-LM/blob/dev/megatron/core/optimizer/muon.py

BoxiangW added 9 commits September 15, 2025 14:52

Init commit for MCore with emergent-optimizer

d69d4d8

Signed-off-by: Boxiang Wang <[email protected]>

Format

c466bdc

Signed-off-by: Boxiang Wang <[email protected]>

Copyright

d97869e

Signed-off-by: Boxiang Wang <[email protected]>

Fix

1094261

Signed-off-by: Boxiang Wang <[email protected]>

change import name

a778f61

Signed-off-by: Boxiang Wang <[email protected]>

Switch to newer layerwise version

e38474a

Signed-off-by: Boxiang Wang <[email protected]>

Chanage to newer version of layerwisedistributed opt

d0542d5

Signed-off-by: Boxiang Wang <[email protected]>

fix

9612f2f

Signed-off-by: Boxiang Wang <[email protected]>

Working version

334b96d

Signed-off-by: Boxiang Wang <[email protected]>

BoxiangW added 2 commits September 18, 2025 14:02

Add launch script

c82a9f6

Signed-off-by: Boxiang Wang <[email protected]>

Improve example script

1cd42cb

Signed-off-by: Boxiang Wang <[email protected]>

BoxiangW self-assigned this Sep 18, 2025

BoxiangW added 3 commits September 18, 2025 23:35

Fix EP issue

dd7caa0

Signed-off-by: Boxiang Wang <[email protected]>

Fix EP issue

1ec8e27

Signed-off-by: Boxiang Wang <[email protected]>

Fix launch script

e747256

Signed-off-by: Boxiang Wang <[email protected]>

BoxiangW closed this Oct 11, 2025

nagarajankarthik mentioned this pull request Oct 15, 2025

Tensor parallel muon #1865

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dist_Muon optimizer support #1813

Dist_Muon optimizer support #1813

Uh oh!

BoxiangW commented Sep 18, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Sep 18, 2025

Uh oh!

BoxiangW commented Oct 11, 2025

Uh oh!

valentyn1boreiko commented Oct 15, 2025

Uh oh!

BoxiangW commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dist_Muon optimizer support #1813

Dist_Muon optimizer support #1813

Uh oh!

Conversation

BoxiangW commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Sep 18, 2025

Uh oh!

BoxiangW commented Oct 11, 2025

Uh oh!

valentyn1boreiko commented Oct 15, 2025

Uh oh!

BoxiangW commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BoxiangW commented Sep 18, 2025 •

edited

Loading