-
Notifications
You must be signed in to change notification settings - Fork 2.1k
🏗️ Refactor top-entropy in GRPO #3727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… and update related documentation for clarity on entropy masking.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im wondering the default being 1 has better result? cuz it seems like this would mask all but the highest entropy token?
i think cuz actually top_entropy_quantile = max(entropies of non-padding tokens)
So this way only the tokens with exactly the highest entropy will be marked True
?
Actually it's the opposite, I fixed the doc here: |
Thanks for the changes! I'll take a look later today. |
The test case corresponding to training a model with the entropy mask needs to be updated to use the new argument |
What does this PR do?
Some minor change, late review from #3563
_compute_entropy_mask
->get_high_entropy_mask
token_entropy_percentile_threshold
->top_entropy_quantile
top better align with the top-20% message (its more a quantile than a percentile here._get_per_token_logps_and_entropies
returning a tuple, but personal preferencecc @pramodith happy to have you thoughts on this