-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Feature: Add SGLang support for GRPO Trainer #3627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Testing result will be this comment:
More testing ongoing |
|
shall we close the other PR? |
| requests; python_version < "3.13" | ||
| uvicorn; python_version < "3.13" | ||
|
|
||
| sglang = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be sglang[all]>=0.4.6post2?
Without the [all], I'm getting lots of modulenotfounderrors.
| # start sglang-server | ||
| python3 -m sglang.launch_server --model-path qwen/qwen2.5-7b-instruct | ||
|
|
||
| # run "export CUDA_VISIBLE_DEVICES" | ||
| # run script | ||
| python3 grpo_test.py | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is grpo_test.py?
docs/source/grpo_trainer.md
Outdated
| from trl import GRPOConfig, GRPOTrainer | ||
|
|
||
|
|
||
| dataset = load_dataset("trl-lib/tldr", split="train[:10%]”) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split="train[:10%]” should be split="train[:10%]"
Turn to the online server API Usage add test and fix bugs in result parsing Pass First test with fixing _update_sglang_weights Remove checkpoints from tracking and add to .gitignore config to run on single gpu successfully Update code to align with vllm save model and update weight save model only main process A runnable update_from_tensor version fix performance issue resolve comment: help strings resolve comment: help strings Update trl/trainer/grpo_config.py Update trl/trainer/grpo_config.py Update trl/trainer/grpo_config.py Update trl/trainer/grpo_config.py Update trl/trainer/grpo_config.py Update trl/trainer/grpo_config.py Update trl/trainer/grpo_trainer.py call raise_for_status remove duplicate doc string formatting add sglang to extras formatting import requests only when sglang is available formatting undo formatting undo formatting more undo last one! add initial docs add sglang last one now new line delete test scripts Update setup.cfg Update setup.cfg intiial sglang-serve cli script Update trl/trainer/grpo_trainer.py remove dead code Co-authored-by: Kashif Rasul <[email protected]> debug GRPO trainer change num_processes update how to run sglang
da2d521 to
d2ed28b
Compare
What does this PR do?
This PR re-implements and rebases the original work from #3370, which adds SGLang support as a rollout engine for GRPO training in TRL.
We have rebased the original PR onto the latest main branch and testing the behavior of weight updates.
See also #103
@ChangyiYang
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.