Fix #941: Correctly define choices for --reward-model as a tuple #974
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #941
Problem Description:
In the e2e_rlhf.py script, the choices option for the --reward-model parameter was incorrectly defined as a string ("350m") instead of a single-element tuple ("350m",). This caused argparse to interpret it as a sequence of individual characters ('(', '3', '5', '0', 'm', ')') rather than a valid list of options.
Solution:
As suggested in issue #941, this commit changes the definition of choices to ("350m",) by adding a trailing comma after the string. This ensures it is correctly parsed as a tuple containing the single element "350m".
Testing:
The script correctly accepts the parameter when run with the valid option --reward-model 350m.
When run with an invalid option (e.g., --reward-model 123m), argparse correctly throws an "invalid choice" error.
The script correctly uses the default value when the --reward-model parameter is not provided.
Additional Notes (Optional):
Thanks to @ChenDaiwei-99 (the original reporter of the issue) for identifying the problem and suggesting the solution.