Fix #941: Correctly define choices for --reward-model as a tuple #974

Flink-ddd · 2025-06-05T05:10:54Z

Fixes #941

Problem Description:
In the e2e_rlhf.py script, the choices option for the --reward-model parameter was incorrectly defined as a string ("350m") instead of a single-element tuple ("350m",). This caused argparse to interpret it as a sequence of individual characters ('(', '3', '5', '0', 'm', ')') rather than a valid list of options.

Solution:
As suggested in issue #941, this commit changes the definition of choices to ("350m",) by adding a trailing comma after the string. This ensures it is correctly parsed as a tuple containing the single element "350m".

Testing:

The script correctly accepts the parameter when run with the valid option --reward-model 350m.
When run with an invalid option (e.g., --reward-model 123m), argparse correctly throws an "invalid choice" error.
The script correctly uses the default value when the --reward-model parameter is not provided.
Additional Notes (Optional):
Thanks to @ChenDaiwei-99 (the original reporter of the issue) for identifying the problem and suggesting the solution.

hwchen2017 · 2025-06-08T21:35:23Z

Hi @Flink-ddd, can you fix the formatting error?

…deepspeedai#941 Signed-off-by: Vensenmu <[email protected]>

Flink-ddd · 2025-06-09T05:37:20Z

Hi @hwchen2017, I just fixed the formatting error and submitted again, Thank you very much for reminding me.

Flink-ddd requested a review from tjruwase as a code owner June 5, 2025 05:10

Flink-ddd force-pushed the fix-argparser-bug-941 branch from 72b76d2 to ce5a1e8 Compare June 5, 2025 05:14

fix: Fix: Correctly define choices as tuple for reward-model arg Fixes …

5addcd9

…deepspeedai#941 Signed-off-by: Vensenmu <[email protected]>

Flink-ddd force-pushed the fix-argparser-bug-941 branch from ce5a1e8 to 5addcd9 Compare June 9, 2025 05:32

hwchen2017 approved these changes Jun 9, 2025

View reviewed changes

hwchen2017 merged commit 86aeab2 into deepspeedai:master Jun 9, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix #941: Correctly define choices for --reward-model as a tuple #974

Fix #941: Correctly define choices for --reward-model as a tuple #974

Uh oh!

Flink-ddd commented Jun 5, 2025

Uh oh!

hwchen2017 commented Jun 8, 2025

Uh oh!

Flink-ddd commented Jun 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix #941: Correctly define choices for --reward-model as a tuple #974

Fix #941: Correctly define choices for --reward-model as a tuple #974

Uh oh!

Conversation

Flink-ddd commented Jun 5, 2025

Uh oh!

hwchen2017 commented Jun 8, 2025

Uh oh!

Flink-ddd commented Jun 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants