We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 90c7876 commit 9a1e6a4Copy full SHA for 9a1e6a4
trl/trainer/grpo_config.py
@@ -553,7 +553,7 @@ class GRPOConfig(TrainingArguments):
553
metadata={
554
"help": "ρ parameter from Beyond the 80/20 Rule. Keeps in the policy loss term only the top-ρ quantile of "
555
"tokens by entropy of the probability distribution at each sequence position, improving results. Range: "
556
- "[0.0-1.0]. A value of `1.0` masks all but the highest entropy token; `0.0` keeps all tokens. The paper "
+ "[0.0-1.0]. A value of `0.0` masks all but the highest entropy token; `1.0` keeps all tokens. The paper "
557
"recommends a value of `0.2`. If used with `mask_truncated_completions=True`, only tokens from "
558
"non-truncated completions are considered."
559
},
0 commit comments