Skip to content

Conversation

@edbeeching
Copy link
Collaborator

@edbeeching edbeeching commented Feb 19, 2025

GRPO trainer to train on N + 1 nodes, with 1 node allocated for generation. Very experimental, so expect hard edges!

Usage:

For training, run:

accelerate launch --config_file=recipes/accelerate_configs/zero3.yaml scripts/remote_grpo.py \
    --config recipes/Qwen2.5-1.5B-Instruct/grpo/config_remote.yaml

This will automatically spin up an SGLang server on a separate Slurm node and use it for generation.

For development, first spin up an SGLang sever on a separate node:

python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-1.5B-Instruct   --port=30010 --skip-tokenizer-init --mem-fraction-static 0.7 --host=0.0.0.0 --dp-size=8

Then run training by providing the IP address of the server:

accelerate launch --config_file=recipes/accelerate_configs/zero3.yaml scripts/remote_grpo.py \
    --config recipes/Qwen2.5-1.5B-Instruct/grpo/config_remote.yaml \
    --remote_gen_model_url ip-26-0-160-103

TODO

  • Remove hard-coded filepath for temporary checkpoints
  • Refactor reference model log probs to happen within generation step
  • Implement μ iterations from GRPO
  • Validate against TRL

@troy12x
Copy link

troy12x commented Feb 22, 2025

please my goat push this to main pleaseeee

@troy12x
Copy link

troy12x commented Feb 22, 2025

my project is waiting for ur code no joke

@troy12x
Copy link

troy12x commented Feb 24, 2025

hi 👉👈

@troy12x
Copy link

troy12x commented Feb 24, 2025

please finish this my goat

@qgallouedec
Copy link
Member

@troy12x please avoid spamming 🙏 it doesn't help, be sure that we're working hard on this.

@troy12x
Copy link

troy12x commented Feb 25, 2025

sry i dont mean but i really need you guys to finish this fast

@troy12x
Copy link

troy12x commented Feb 25, 2025

good luck !

@arthurPignetOwkin
Copy link

Hello all, thank for your work.

I would like to checkout this branch and give it a try.
Can you point me toward the best commit to checkout ? Otherwise i will go for the latest.

Cheers,

@min-xu-et
Copy link

maybe a naive question: why is "faster" in quotes in the title of this PR? Is it because there is no overlapping between the training and inference step which limits the speed of this multi-node setup?

@binary-husky
Copy link
Contributor

I'm trying to do a similar thing, a difference is that I use VLLM for generation and use NCCL to transfer model parameters:

huggingface/trl#3094

@edbeeching
Copy link
Collaborator Author

edbeeching commented Mar 21, 2025

closing in favor of #533
@arthurPignetOwkin & @min-xu-et please refer to that implementation which will be merged soon.

@edbeeching edbeeching closed this Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants