WIP "Faster" grpo trainer #371

edbeeching · 2025-02-19T10:22:52Z

GRPO trainer to train on N + 1 nodes, with 1 node allocated for generation. Very experimental, so expect hard edges!

Usage:

For training, run:

accelerate launch --config_file=recipes/accelerate_configs/zero3.yaml scripts/remote_grpo.py \
    --config recipes/Qwen2.5-1.5B-Instruct/grpo/config_remote.yaml

This will automatically spin up an SGLang server on a separate Slurm node and use it for generation.

For development, first spin up an SGLang sever on a separate node:

python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-1.5B-Instruct   --port=30010 --skip-tokenizer-init --mem-fraction-static 0.7 --host=0.0.0.0 --dp-size=8

Then run training by providing the IP address of the server:

accelerate launch --config_file=recipes/accelerate_configs/zero3.yaml scripts/remote_grpo.py \
    --config recipes/Qwen2.5-1.5B-Instruct/grpo/config_remote.yaml \
    --remote_gen_model_url ip-26-0-160-103

TODO

Remove hard-coded filepath for temporary checkpoints
Refactor reference model log probs to happen within generation step
Implement μ iterations from GRPO
Validate against TRL

troy12x · 2025-02-22T22:21:29Z

please my goat push this to main pleaseeee

troy12x · 2025-02-22T22:21:44Z

my project is waiting for ur code no joke

troy12x · 2025-02-24T23:24:35Z

hi 👉👈

troy12x · 2025-02-24T23:24:50Z

please finish this my goat

qgallouedec · 2025-02-24T23:44:36Z

@troy12x please avoid spamming 🙏 it doesn't help, be sure that we're working hard on this.

troy12x · 2025-02-25T00:04:50Z

sry i dont mean but i really need you guys to finish this fast

troy12x · 2025-02-25T00:05:04Z

good luck !

arthurPignetOwkin · 2025-03-11T17:15:04Z

Hello all, thank for your work.

I would like to checkout this branch and give it a try.
Can you point me toward the best commit to checkout ? Otherwise i will go for the latest.

Cheers,

min-xu-et · 2025-03-13T17:26:51Z

maybe a naive question: why is "faster" in quotes in the title of this PR? Is it because there is no overlapping between the training and inference step which limits the speed of this multi-node setup?

binary-husky · 2025-03-18T02:42:18Z

I'm trying to do a similar thing, a difference is that I use VLLM for generation and use NCCL to transfer model parameters:

huggingface/trl#3094

edbeeching · 2025-03-21T14:05:18Z

closing in favor of #533
@arthurPignetOwkin & @min-xu-et please refer to that implementation which will be merged soon.

edbeeching added 3 commits February 19, 2025 09:48

make it run

38e350d

adds weight sync

ed9554f

adds licence, world size patch for vllm

dcdceba

troy12x approved these changes Feb 19, 2025

View reviewed changes

edbeeching added 13 commits February 19, 2025 21:28

patches to get vllm working in ddp setting

fbe3b07

hacky fixes for memory issues

b1394e5

save WIP

12be29f

adds remote model, job launch, switch to sglang server with weight sync

5d213c4

style

382a0c7

adds liger kernel support

420d72a

changes the completion attn mask

3e37bf1

adds ref model offload

8756288

save WIP

f50658e

save wip

f68c27b

sampler

55a451a

Merge branch 'main' into faster-grpo-trainer

09628da

adds logging to remote grpo

0db1912

edbeeching and others added 6 commits February 26, 2025 09:06

save wip

c775de3

save configs wip

0030447

Merge branch 'main' into faster-grpo-trainer

9de5884

adds smollm grpo config for replication

67fb66a

fix job launcher

00b1f61

Merge branch 'main' into faster-grpo-trainer

def83fc

lewtun and others added 12 commits March 8, 2025 12:56

Add local logging

b47880b

Fix reward devices

f4fe355

Merge branch 'main' into faster-grpo-trainer

14d75bf

Refactor preparation

a3d1f26

Fix

96546e7

Make checkpoint dir customisatble

4fa226b

Restore Liger

5fdbcd5

fix lewis bugs

a4004f6

move ref logprob

c69164a

disable map progress bar

0c3e50f

add clipping, fix bug with sampler

1c4efd5

grpo configs

fd62cf2

temp dumping of samples to json

e922330

edbeeching and others added 5 commits March 13, 2025 21:59

Merge branch 'main' into faster-grpo-trainer

93d8bc5

cleaning up grpo PR

c6ab52a

style

bf4642b

Fix buffer (#508)

6e275af

adds option to not use ref model

0e4685f

nhannguyen2709 mentioned this pull request Mar 17, 2025

Multi-gpu vllm inference with tensor parallelism, colocating policy model + ref model + vllm engine on the same node #514

Open

edbeeching added 6 commits March 18, 2025 07:57

add remote grpo exp configs

1b07494

adds option for a mock remote model

4f94c35

fixes for gradient checkpointing, profiling context

41f41d8

pushing last changes while I windown this PR

36c3867

add default model revision

a5668ec

slurm

f08b559

edbeeching closed this Mar 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP "Faster" grpo trainer #371

WIP "Faster" grpo trainer #371

Uh oh!

edbeeching commented Feb 19, 2025 •

edited by lewtun

Loading

Uh oh!

troy12x commented Feb 22, 2025

Uh oh!

troy12x commented Feb 22, 2025

Uh oh!

troy12x commented Feb 24, 2025

Uh oh!

troy12x commented Feb 24, 2025

Uh oh!

qgallouedec commented Feb 24, 2025

Uh oh!

troy12x commented Feb 25, 2025

Uh oh!

troy12x commented Feb 25, 2025

Uh oh!

arthurPignetOwkin commented Mar 11, 2025

Uh oh!

min-xu-et commented Mar 13, 2025

Uh oh!

binary-husky commented Mar 18, 2025

Uh oh!

edbeeching commented Mar 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

WIP "Faster" grpo trainer #371

WIP "Faster" grpo trainer #371

Uh oh!

Conversation

edbeeching commented Feb 19, 2025 • edited by lewtun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

troy12x commented Feb 22, 2025

Uh oh!

troy12x commented Feb 22, 2025

Uh oh!

troy12x commented Feb 24, 2025

Uh oh!

troy12x commented Feb 24, 2025

Uh oh!

qgallouedec commented Feb 24, 2025

Uh oh!

troy12x commented Feb 25, 2025

Uh oh!

troy12x commented Feb 25, 2025

Uh oh!

arthurPignetOwkin commented Mar 11, 2025

Uh oh!

min-xu-et commented Mar 13, 2025

Uh oh!

binary-husky commented Mar 18, 2025

Uh oh!

edbeeching commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

edbeeching commented Feb 19, 2025 •

edited by lewtun

Loading

edbeeching commented Mar 21, 2025 •

edited

Loading