What are the reasons num_iterations
(μ) defaulted to 1 in GRPO trainer?
#3548
Unanswered
JenWei0312
asked this question in
Q&A
Replies: 1 comment 5 replies
-
Great question — yes, μ=1 is the default in the DeepSeek Math paper and in the current implementation. It does make sense to increase Also, another way to improve efficiency is by increasing |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! I've been studying the GRPO implementation and noticed that
num_iterations
defaults to 1.From my understanding of the DeepSeek paper, setting μ > 1 is a key feature that allows multiple policy updates from a single generation batch, improving computational efficiency.
Could you help me understand:
I'm asking because many users tend to use default values. If my understanding is correct, then they might be missing out on GRPO's efficiency benefits.
Thanks for the great work on this trainer, and please let me know if I missed anything. 🙏
Beta Was this translation helpful? Give feedback.
All reactions