[GRPO] Fix loss normalization #2881

edbeeching · 2025-02-17T11:07:04Z

What does this PR do?

The current GRPO implementation uses per-sequence normalization, this PR corrects this to be global normalization

Details:
In Causal Language Modelling, we typically use global normalization to scale the loss, so that each unmasked token's loss provides the same contribution to the total loss. Example from transformers codebase: https://github.com/huggingface/transformers/blob/fae0f3dde83b7a54441f7a5bb0fc45d354fe81ce/src/transformers/loss/loss_utils.py#L24-L29

HuggingFaceDocBuilderDev · 2025-02-17T11:11:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BramVanroy · 2025-03-15T10:51:12Z

As seen on Twitter, some discussion about this change: https://x.com/danielhanchen/status/1900844864134410695

gameofdimension · 2025-03-22T11:39:27Z

should we make it configurable?

* fix GRPO loss normalization * fix sum dim * fix loss= repeated

## What does this PR do? Adds support for token-level loss (ie, `token_mean` loss reduction type) as introduced by DAPO. With `token_mean` loss reduction, all tokens in all sequences contribute equally to loss. The loss reduction type is configurable via `trainer.algorithm.loss_reduction`, but the default is updated to be `token_mean`, as opposed to our previous implementation (`sequence_mean`). This loss reduction is what the community is standardizing on as default (TRL's [default](huggingface/trl#2881), verl's [default](https://github.com/volcengine/verl/blob/517cc23c9dbb0da5c2cd2b012466790e29cb781a/verl/trainer/config/actor/actor.yaml#L63)) Wandb report of comparing `token_mean` vs `sequence_mean`: https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k/reports/Token-level-loss-token_mean---VmlldzoxMzYwMDc4MQ The only plot with a notable difference is `policy_loss`, which is much larger for `token_mean` than it is for `sequence_mean`: <img width="312" height="274" alt="Screenshot 2025-07-15 at 9 52 57 AM" src="https://github.com/user-attachments/assets/40f94cb6-c5e5-47f6-9b09-a076811746a0" /> However, this `policy_loss` matches the same magnitude of `pg_loss` we observe in verl: <img width="980" height="611" alt="Screenshot 2025-07-15 at 9 54 39 AM" src="https://github.com/user-attachments/assets/53714573-2b21-4e67-b30a-dd3648279438" /> --------- Co-authored-by: Sumanth R Hegde <[email protected]>

edbeeching added 3 commits February 17, 2025 11:55

fix GRPO loss normalization

0e10950

fix sum dim

de44135

fix loss= repeated

b082791

edbeeching requested a review from qgallouedec February 17, 2025 11:08

qgallouedec approved these changes Feb 17, 2025

View reviewed changes

edbeeching merged commit 293b620 into main Feb 17, 2025
14 checks passed

edbeeching deleted the fix-grpo-loss-normalization branch February 17, 2025 12:26

kashif mentioned this pull request Feb 18, 2025

Grpo loss linkedin/Liger-Kernel#553

Merged

3 tasks

qgallouedec mentioned this pull request Mar 1, 2025

Loss normalization in GRPOTrainer #2995

Closed

This was referenced Mar 23, 2025

Fix length bias for Dr GRPO #3138

Closed

Add support to new DAPO method #3130

Closed

🤝 Align GRPO equation doc with the implementation #3151

Merged

edbeeching mentioned this pull request Apr 4, 2025

[GRPO] Adds an option to scale the loss by a constant factor #3231

Closed

yxliu-TAMU pushed a commit to mincheolseong/ECEN743-GRPO-Project-Proposal that referenced this pull request Apr 20, 2025

[GRPO] Fix loss normalization (huggingface#2881)

e6e89e7

* fix GRPO loss normalization * fix sum dim * fix loss= repeated

tyler-griggs mentioned this pull request Jul 15, 2025

Support token-level loss, make default NovaSky-AI/SkyRL#90

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GRPO] Fix loss normalization #2881

[GRPO] Fix loss normalization #2881

edbeeching commented Feb 17, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Feb 17, 2025

Uh oh!

Uh oh!

BramVanroy commented Mar 15, 2025

Uh oh!

gameofdimension commented Mar 22, 2025

Uh oh!

Uh oh!

[GRPO] Fix loss normalization #2881

[GRPO] Fix loss normalization #2881

Conversation

edbeeching commented Feb 17, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Feb 17, 2025

Uh oh!

Uh oh!

BramVanroy commented Mar 15, 2025

Uh oh!

gameofdimension commented Mar 22, 2025

Uh oh!

Uh oh!