Skip to content

Conversation

@Flink-ddd
Copy link
Contributor

Fixes #941

Problem Description:
In the e2e_rlhf.py script, the choices option for the --reward-model parameter was incorrectly defined as a string ("350m") instead of a single-element tuple ("350m",). This caused argparse to interpret it as a sequence of individual characters ('(', '3', '5', '0', 'm', ')') rather than a valid list of options.

Solution:
As suggested in issue #941, this commit changes the definition of choices to ("350m",) by adding a trailing comma after the string. This ensures it is correctly parsed as a tuple containing the single element "350m".

Testing:

The script correctly accepts the parameter when run with the valid option --reward-model 350m.
When run with an invalid option (e.g., --reward-model 123m), argparse correctly throws an "invalid choice" error.
The script correctly uses the default value when the --reward-model parameter is not provided.
Additional Notes (Optional):
Thanks to @ChenDaiwei-99 (the original reporter of the issue) for identifying the problem and suggesting the solution.

@Flink-ddd Flink-ddd requested a review from tjruwase as a code owner June 5, 2025 05:10
@Flink-ddd Flink-ddd force-pushed the fix-argparser-bug-941 branch from 72b76d2 to ce5a1e8 Compare June 5, 2025 05:14
@hwchen2017
Copy link
Contributor

Hi @Flink-ddd, can you fix the formatting error?

@Flink-ddd Flink-ddd force-pushed the fix-argparser-bug-941 branch from ce5a1e8 to 5addcd9 Compare June 9, 2025 05:32
@Flink-ddd
Copy link
Contributor Author

Hi @hwchen2017, I just fixed the formatting error and submitted again, Thank you very much for reminding me.

@hwchen2017 hwchen2017 merged commit 86aeab2 into deepspeedai:master Jun 9, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A bug in argument parser.

2 participants