🏗️ Refactor top-entropy in GRPO #3727

qgallouedec · 2025-07-12T02:50:14Z

What does this PR do?

Some minor change, late review from #3563

Don't use/unittest private method: _compute_entropy_mask -> get_high_entropy_mask
token_entropy_percentile_threshold -> top_entropy_quantile top better align with the top-20% message (its more a quantile than a percentile here.
Fix the doc: link and var name (rho, not tau)
I better like _get_per_token_logps_and_entropies returning a tuple, but personal preference

cc @pramodith happy to have you thoughts on this

… and update related documentation for clarity on entropy masking.

HuggingFaceDocBuilderDev · 2025-07-12T02:57:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

shirinyamani

Im wondering the default being 1 has better result? cuz it seems like this would mask all but the highest entropy token?
i think cuz actually top_entropy_quantile = max(entropies of non-padding tokens)
So this way only the tokens with exactly the highest entropy will be marked True ?

trl/trainer/grpo_config.py

qgallouedec · 2025-07-12T05:29:25Z

Im wondering the default being 1 has better result? cuz it seems like this would mask all but the highest entropy token?

Actually it's the opposite, I fixed the doc here:
5621e1a

pramodith · 2025-07-12T06:08:42Z

Thanks for the changes! I'll take a look later today.

trl/trainer/grpo_trainer.py

tests/test_grpo_trainer.py

trl/trainer/grpo_trainer.py

tests/test_grpo_trainer.py

trl/trainer/grpo_trainer.py

pramodith · 2025-07-16T22:07:37Z

The test case corresponding to training a model with the entropy mask needs to be updated to use the new argument top_entropy_quantile for the test to pass.

tests/test_grpo_trainer.py

qgallouedec added 4 commits July 12, 2025 02:14

don't unittest private methods

5c178d8

Rename token_entropy_percentile_threshold to top_entropy_quantile…

409d0c6

… and update related documentation for clarity on entropy masking.

rephrase

78c4f33

makes more sense like this

6e08ef6

qgallouedec and others added 2 commits July 12, 2025 03:09

nits

b30414a

Merge branch 'main' into clean-entropy

4e59c3b

qgallouedec requested review from kashif, edbeeching, LeonEricsson and shirinyamani July 12, 2025 03:18

shirinyamani reviewed Jul 12, 2025

View reviewed changes

qgallouedec commented Jul 12, 2025

View reviewed changes

trl/trainer/grpo_config.py Outdated Show resolved Hide resolved

Update trl/trainer/grpo_config.py

5621e1a

qgallouedec commented Jul 12, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

qgallouedec commented Jul 12, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

Update trl/trainer/grpo_trainer.py

431135b

pramodith reviewed Jul 12, 2025

View reviewed changes

tests/test_grpo_trainer.py Show resolved Hide resolved

qgallouedec and others added 2 commits July 16, 2025 05:26

Change get_high_entropy_mask to handle cases with no non-pad tokens

24fc4ab

Merge branch 'main' into clean-entropy

3f00302

LeonEricsson approved these changes Jul 16, 2025

View reviewed changes

pramodith reviewed Jul 16, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

tests/test_grpo_trainer.py Show resolved Hide resolved

tests/test_grpo_trainer.py Outdated Show resolved Hide resolved

trl/trainer/grpo_trainer.py Show resolved Hide resolved

qgallouedec and others added 2 commits July 19, 2025 13:29

Merge branch 'main' into clean-entropy

a5230b5

update test and don't test private method

a500e44

qgallouedec commented Jul 19, 2025

View reviewed changes

tests/test_grpo_trainer.py Show resolved Hide resolved

qgallouedec merged commit 116ec49 into main Jul 19, 2025
10 of 11 checks passed

qgallouedec deleted the clean-entropy branch July 19, 2025 20:48

marcandrelarochelle pushed a commit to marcandrelarochelle/trl that referenced this pull request Jul 29, 2025

🏗️ Refactor top-entropy in GRPO (huggingface#3727)

262643c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🏗️ Refactor top-entropy in GRPO #3727

🏗️ Refactor top-entropy in GRPO #3727

Uh oh!

qgallouedec commented Jul 12, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jul 12, 2025

Uh oh!

shirinyamani left a comment

Uh oh!

Uh oh!

qgallouedec commented Jul 12, 2025

Uh oh!

pramodith commented Jul 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pramodith commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

🏗️ Refactor top-entropy in GRPO #3727

🏗️ Refactor top-entropy in GRPO #3727

Uh oh!

Conversation

qgallouedec commented Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jul 12, 2025

Uh oh!

shirinyamani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qgallouedec commented Jul 12, 2025

Uh oh!

pramodith commented Jul 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pramodith commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec commented Jul 12, 2025 •

edited

Loading