📏 Completion length logging fix + remainder logging fix #3482

shirinyamani · 2025-05-22T17:16:48Z

What does this PR do?

I'm testing using this tiny script on two gpus:

for server:

CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen3-0.6B

trianing script:

from datasets import load_dataset
from trl import GRPOTrainer, GRPOConfig

dataset = load_dataset("trl-lib/tldr", split="train[:1%]")

# Dummy reward function: count the number of unique characters in the completions
def reward_num_unique_chars(completions, **kwargs):
    return [len(set(c)) for c in completions]

training_args = GRPOConfig(
    output_dir="qwen3-mask-completions",
    bf16=True,
    gradient_checkpointing=True,
    logging_steps=10,
    use_vllm=True,
    mask_truncated_completions=True,
)

trainer = GRPOTrainer(
    model="Qwen/Qwen3-0.6B",
    args=training_args,
    reward_funcs=reward_num_unique_chars,
    train_dataset=dataset,
)

trainer.train()

for training:

CUDA_VISIBLE_DEVICES=0 accelerate launch trl/trainer/test.py

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-05-22T17:23:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

trl/trainer/grpo_trainer.py

Co-authored-by: Quentin Gallouédec <[email protected]>

trl/trainer/grpo_trainer.py

qgallouedec · 2025-05-26T23:01:23Z

It seems like the PR isn't working, see the CI

qgallouedec · 2025-05-27T16:40:47Z

I'm taking over this PR so that we can have this fix in the next release.

qgallouedec · 2025-05-27T20:05:49Z

TLDR: don't use gather_for_metrics (use gather instead) in GRPO

I just found a subtle bug in GRPO (thankfully with no major consequences). Since we’re rewriting the sampler and ensure there’s no remainder (see here), but accelerate doesn’t detect that and assumes a remainder still exists in the dataset. As a result, using gather_for_metrics is unsafe, as it tries to truncate data based on a nonexistent remainder, which can lead to a mismatched shape error, see this CI
It's pretty amazing that it hasn't caused any bugs before.

completion-mask log fix

586c7b0

shirinyamani changed the title ~~completion-mask log fix~~ 🔧 completion mask logging fix May 22, 2025

CI error fix

4a775fb

shirinyamani requested review from edbeeching and qgallouedec May 22, 2025 18:38

shirinyamani and others added 3 commits May 23, 2025 12:30

Merge branch 'main' into com-mask-log

5ec198f

com-mask

441ca98

com-mask

616bef9

qgallouedec reviewed May 24, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

Update trl/trainer/grpo_trainer.py

06be109

Co-authored-by: Quentin Gallouédec <[email protected]>

qgallouedec reviewed May 26, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

qgallouedec and others added 3 commits May 27, 2025 09:40

Merge branch 'main' into com-mask-log

1d1a9aa

fix and refactor a bit

a1b9bfd

fix gather_for_metrics

138840b

qgallouedec changed the title ~~🔧 completion mask logging fix~~ 📏 Completion length logging fix May 27, 2025

qgallouedec changed the title ~~📏 Completion length logging fix~~ 📏 Completion length logging fix + remainder logging fix May 27, 2025

Merge branch 'main' into com-mask-log

2b0861e

qgallouedec approved these changes May 27, 2025

View reviewed changes

it's agg

7346657

qgallouedec merged commit 17a9069 into main May 27, 2025
11 checks passed

qgallouedec deleted the com-mask-log branch May 27, 2025 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

📏 Completion length logging fix + remainder logging fix #3482

📏 Completion length logging fix + remainder logging fix #3482

Uh oh!

shirinyamani commented May 22, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

qgallouedec commented May 26, 2025

Uh oh!

qgallouedec commented May 27, 2025

Uh oh!

qgallouedec commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

📏 Completion length logging fix + remainder logging fix #3482

📏 Completion length logging fix + remainder logging fix #3482

Uh oh!

Conversation

shirinyamani commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

qgallouedec commented May 26, 2025

Uh oh!

qgallouedec commented May 27, 2025

Uh oh!

qgallouedec commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

shirinyamani commented May 22, 2025 •

edited

Loading