Skip to content

Conversation

@PrinsYin
Copy link

@PrinsYin PrinsYin commented Jun 21, 2025

What does this PR do?

This PR re-implements and rebases the original work from #3370, which adds SGLang support as a rollout engine for GRPO training in TRL.

We have rebased the original PR onto the latest main branch and testing the behavior of weight updates.

See also #103

@ChangyiYang

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ChangyiYang
Copy link

Testing result will be this comment:

More testing ongoing

@kashif
Copy link
Collaborator

kashif commented Jun 22, 2025

shall we close the other PR?

requests; python_version < "3.13"
uvicorn; python_version < "3.13"

sglang =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be sglang[all]>=0.4.6post2?
Without the [all], I'm getting lots of modulenotfounderrors.

Comment on lines +256 to +262
# start sglang-server
python3 -m sglang.launch_server --model-path qwen/qwen2.5-7b-instruct

# run "export CUDA_VISIBLE_DEVICES"
# run script
python3 grpo_test.py
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is grpo_test.py?

from trl import GRPOConfig, GRPOTrainer


dataset = load_dataset("trl-lib/tldr", split="train[:10%]”)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split="train[:10%]” should be split="train[:10%]"

Turn to the online server API Usage

add test and fix bugs in result parsing

Pass First test with fixing _update_sglang_weights

Remove checkpoints from tracking and add to .gitignore

config to run on single gpu successfully

Update code to align with vllm

save model and update weight

save model only main process

A runnable update_from_tensor version

fix performance issue

resolve comment: help strings

resolve comment: help strings

Update trl/trainer/grpo_config.py

Update trl/trainer/grpo_config.py

Update trl/trainer/grpo_config.py

Update trl/trainer/grpo_config.py

Update trl/trainer/grpo_config.py

Update trl/trainer/grpo_config.py

Update trl/trainer/grpo_trainer.py

call raise_for_status

remove duplicate

doc string

formatting

add sglang to extras

formatting

import requests only when  sglang is available

formatting

undo formatting

undo formatting

more undo

last one!

add initial docs

add sglang

last one now

new line

delete test scripts

Update setup.cfg

Update setup.cfg

intiial sglang-serve cli script

Update trl/trainer/grpo_trainer.py

remove dead code

Co-authored-by: Kashif Rasul <[email protected]>

debug GRPO trainer

change num_processes

update how to run sglang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants