-
Notifications
You must be signed in to change notification settings - Fork 96
Description
hey authors, thanks for open-sourcing the project.
in the arxiv paper, you mentioned that the outcome-based reward for code is set to be the unit test pass rate, and specified the processing for code problems (sec b.1).
however, this part looks to be missing in the current repository.
would you please consider releasing the relevant code for reproducing? or remove the code-orm-related content in the paper if that is overclaiming?
i raised the issue, because directly using pass rate without discretization is unlikely to provide good reward signals when training codellms in practice.
i doubt if this part is really implemented, and would love to reproduce if you could provide more details. it looks to be non-trivial to have that in the verl pipeline.
thank you!