Skip to content

Commit 9a1e6a4

Browse files
1787648106lunzhongwangLeonEricsson
authored
Correction parameter description (#3803)
Co-authored-by: lunzhongwang <[email protected]> Co-authored-by: LeonEricsson <[email protected]>
1 parent 90c7876 commit 9a1e6a4

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

trl/trainer/grpo_config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -553,7 +553,7 @@ class GRPOConfig(TrainingArguments):
553553
metadata={
554554
"help": "ρ parameter from Beyond the 80/20 Rule. Keeps in the policy loss term only the top-ρ quantile of "
555555
"tokens by entropy of the probability distribution at each sequence position, improving results. Range: "
556-
"[0.0-1.0]. A value of `1.0` masks all but the highest entropy token; `0.0` keeps all tokens. The paper "
556+
"[0.0-1.0]. A value of `0.0` masks all but the highest entropy token; `1.0` keeps all tokens. The paper "
557557
"recommends a value of `0.2`. If used with `mask_truncated_completions=True`, only tokens from "
558558
"non-truncated completions are considered."
559559
},

0 commit comments

Comments
 (0)