-
Notifications
You must be signed in to change notification settings - Fork 83
Open
Description
The reasoning gym repo (https://github.com/open-thought/reasoning-gym) contains a large set of verifiable reasoning environments for training. This issue tracks support for training in ReasoningGym environments on top of SkyRL.
I expect this integration to largely require:
- Handling the dataset format. ReasoningGym uses procedurally generated datasets (e.g., generated game maps), whereas SkyRL currently expects a complete dataset handed to the training stack at the start of training.
- Handling scoring of the completed generations. ReasoningGym already provides methods to score the model's output on some task, and this method should be used to compute rewards.
Reasoning Gym has example integrations with other training stacks that will serve as useful references.
TODOs
- Create a
ReasoningGymDataset
class that formats the procedurally generated ReasoningGym datasets into the format that SkyRL supports. See the ReasoningGym+Verl integration example for reference, and see SkyRL's dataset format docs. - Use the scoring methods provided by ReasoningGym to compute the rewards
Metadata
Metadata
Assignees
Labels
No labels