-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Dist_Muon optimizer support #1813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Boxiang Wang <[email protected]>
Signed-off-by: Boxiang Wang <[email protected]>
Signed-off-by: Boxiang Wang <[email protected]>
Signed-off-by: Boxiang Wang <[email protected]>
Signed-off-by: Boxiang Wang <[email protected]>
Signed-off-by: Boxiang Wang <[email protected]>
Signed-off-by: Boxiang Wang <[email protected]>
Signed-off-by: Boxiang Wang <[email protected]>
Signed-off-by: Boxiang Wang <[email protected]>
Signed-off-by: Boxiang Wang <[email protected]>
Signed-off-by: Boxiang Wang <[email protected]>
Signed-off-by: Boxiang Wang <[email protected]>
Signed-off-by: Boxiang Wang <[email protected]>
Signed-off-by: Boxiang Wang <[email protected]>
|
Closing this PR in favor of https://github.com/NVIDIA/Megatron-LM/tree/dev branch. |
|
Hi @BoxiangW, thank you for the implementation! Going through the code, it doesn't really support distributed Muon, does it? Also the distrib_optimizer.py is still mostly for Adam and is not touched in this PR. |
|
Hi, we have merged this in the new dev branch and added TP support as well https://github.com/NVIDIA/Megatron-LM/blob/dev/megatron/core/optimizer/layer_wise_optimizer.py. TP: https://github.com/NVIDIA/Megatron-LM/blob/dev/megatron/core/optimizer/muon.py |
Features in this PR
Need https://github.com/NVIDIA-NeMo/Emerging-Optimizers
Note: In this PR, there is a example script that uses a custom model (llama3 8b with 8 expert), please change the model structure if needed.
To use dist_muon or muon optimizer, please run
To use example script, please
[muon, dist_muon, adam], defaults todist_muon