Feature: Add SGLang support for GRPO Trainer #3627

PrinsYin · 2025-06-21T23:26:18Z

What does this PR do?

This PR re-implements and rebases the original work from #3370, which adds SGLang support as a rollout engine for GRPO training in TRL.

We have rebased the original PR onto the latest main branch and testing the behavior of weight updates.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

ChangyiYang · 2025-06-22T00:18:10Z

Testing result will be this comment:

Qwen/Qwen2-0.5B-Instruct
https://api.wandb.ai/links/changyiy-cmu/fjd0slfq around 0.3 second

More testing ongoing

kashif · 2025-06-22T07:59:25Z

shall we close the other PR?

zkx06111 · 2025-06-22T16:09:53Z

setup.cfg

    requests; python_version < "3.13"
    uvicorn; python_version < "3.13"

+sglang =


Should this be sglang[all]>=0.4.6post2?
Without the [all], I'm getting lots of modulenotfounderrors.

zkx06111 · 2025-06-22T16:13:11Z

docs/source/grpo_trainer.md

+# start sglang-server
+python3 -m sglang.launch_server --model-path qwen/qwen2.5-7b-instruct
+
+# run "export CUDA_VISIBLE_DEVICES"
+# run script
+python3 grpo_test.py
+```


where is grpo_test.py?

zkx06111 · 2025-06-22T16:14:32Z

docs/source/grpo_trainer.md

+from trl import GRPOConfig, GRPOTrainer
+
+
+dataset = load_dataset("trl-lib/tldr", split="train[:10%]”)


split="train[:10%]” should be split="train[:10%]"

Turn to the online server API Usage add test and fix bugs in result parsing Pass First test with fixing _update_sglang_weights Remove checkpoints from tracking and add to .gitignore config to run on single gpu successfully Update code to align with vllm save model and update weight save model only main process A runnable update_from_tensor version fix performance issue resolve comment: help strings resolve comment: help strings Update trl/trainer/grpo_config.py Update trl/trainer/grpo_config.py Update trl/trainer/grpo_config.py Update trl/trainer/grpo_config.py Update trl/trainer/grpo_config.py Update trl/trainer/grpo_config.py Update trl/trainer/grpo_trainer.py call raise_for_status remove duplicate doc string formatting add sglang to extras formatting import requests only when sglang is available formatting undo formatting undo formatting more undo last one! add initial docs add sglang last one now new line delete test scripts Update setup.cfg Update setup.cfg intiial sglang-serve cli script Update trl/trainer/grpo_trainer.py remove dead code Co-authored-by: Kashif Rasul <[email protected]> debug GRPO trainer change num_processes update how to run sglang

zkx06111 reviewed Jun 22, 2025

View reviewed changes

PrinsYin force-pushed the sglang-server-rebase branch from da2d521 to d2ed28b Compare June 22, 2025 17:26

qgallouedec mentioned this pull request Nov 5, 2025

[Feat] Suppport SGLang as rollout engine of GRPO trainer #3370

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Add SGLang support for GRPO Trainer #3627

Feature: Add SGLang support for GRPO Trainer #3627

Uh oh!

PrinsYin commented Jun 21, 2025 •

edited

Loading

Uh oh!

ChangyiYang commented Jun 22, 2025

Uh oh!

kashif commented Jun 22, 2025

Uh oh!

zkx06111 Jun 22, 2025

Uh oh!

zkx06111 Jun 22, 2025

Uh oh!

zkx06111 Jun 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		from trl import GRPOConfig, GRPOTrainer


		dataset = load_dataset("trl-lib/tldr", split="train[:10%]”)

Feature: Add SGLang support for GRPO Trainer #3627

Are you sure you want to change the base?

Feature: Add SGLang support for GRPO Trainer #3627

Uh oh!

Conversation

PrinsYin commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

ChangyiYang commented Jun 22, 2025

Uh oh!

kashif commented Jun 22, 2025

Uh oh!

zkx06111 Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

zkx06111 Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

zkx06111 Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

PrinsYin commented Jun 21, 2025 •

edited

Loading