Skip to content

🏗️ Refactor top-entropy in GRPO #3727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 19, 2025
Merged

🏗️ Refactor top-entropy in GRPO #3727

merged 12 commits into from
Jul 19, 2025

Conversation

qgallouedec
Copy link
Member

@qgallouedec qgallouedec commented Jul 12, 2025

What does this PR do?

Some minor change, late review from #3563

  1. Don't use/unittest private method: _compute_entropy_mask -> get_high_entropy_mask
  2. token_entropy_percentile_threshold -> top_entropy_quantile top better align with the top-20% message (its more a quantile than a percentile here.
  3. Fix the doc: link and var name (rho, not tau)
  4. I better like _get_per_token_logps_and_entropies returning a tuple, but personal preference

cc @pramodith happy to have you thoughts on this

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@shirinyamani shirinyamani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im wondering the default being 1 has better result? cuz it seems like this would mask all but the highest entropy token?
i think cuz actually top_entropy_quantile = max(entropies of non-padding tokens)
So this way only the tokens with exactly the highest entropy will be marked True ?

@qgallouedec
Copy link
Member Author

Im wondering the default being 1 has better result? cuz it seems like this would mask all but the highest entropy token?

Actually it's the opposite, I fixed the doc here:
5621e1a

@pramodith
Copy link
Collaborator

Thanks for the changes! I'll take a look later today.

@pramodith
Copy link
Collaborator

The test case corresponding to training a model with the entropy mask needs to be updated to use the new argument top_entropy_quantile for the test to pass.

@qgallouedec qgallouedec merged commit 116ec49 into main Jul 19, 2025
10 of 11 checks passed
@qgallouedec qgallouedec deleted the clean-entropy branch July 19, 2025 20:48
marcandrelarochelle pushed a commit to marcandrelarochelle/trl that referenced this pull request Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants