-
Notifications
You must be signed in to change notification settings - Fork 2.4k
WIP "Faster" grpo trainer #371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
please my goat push this to main pleaseeee |
|
my project is waiting for ur code no joke |
|
hi 👉👈 |
|
please finish this my goat |
|
@troy12x please avoid spamming 🙏 it doesn't help, be sure that we're working hard on this. |
|
sry i dont mean but i really need you guys to finish this fast |
|
good luck ! |
|
Hello all, thank for your work. I would like to checkout this branch and give it a try. Cheers, |
|
maybe a naive question: why is "faster" in quotes in the title of this PR? Is it because there is no overlapping between the training and inference step which limits the speed of this multi-node setup? |
|
I'm trying to do a similar thing, a difference is that I use VLLM for generation and use NCCL to transfer model parameters: |
|
closing in favor of #533 |
GRPO trainer to train on N + 1 nodes, with 1 node allocated for generation. Very experimental, so expect hard edges!
Usage:
For training, run:
This will automatically spin up an SGLang server on a separate Slurm node and use it for generation.
For development, first spin up an SGLang sever on a separate node:
Then run training by providing the IP address of the server:
TODO