Support ReasoningGym

The reasoning gym repo (https://github.com/open-thought/reasoning-gym) contains a large set of verifiable reasoning environments for training. This issue tracks support for training in ReasoningGym environments on top of SkyRL.

I expect this integration to largely require:

1. Handling the dataset format. ReasoningGym uses procedurally generated datasets (e.g., generated game maps), whereas SkyRL currently expects a complete dataset handed to the training stack at the start of training.
2. Handling scoring of the completed generations. ReasoningGym already provides methods to score the model's output on some task, and this method should be used to compute rewards.

Reasoning Gym has example integrations [with other training stacks](https://github.com/open-thought/reasoning-gym/tree/main/examples) that will serve as useful references. 

## TODOs
- [ ] Create a `ReasoningGymDataset` class that formats the procedurally generated ReasoningGym datasets into the format that SkyRL supports. See the ReasoningGym+Verl integration [example](https://github.com/open-thought/reasoning-gym/blob/main/examples/veRL/grpo_train.py#L27) for reference, and see SkyRL's [dataset format docs](https://skyrl.readthedocs.io/en/latest/datasets/dataset-preparation.html).
- [ ] Use the scoring methods provided by ReasoningGym to compute the rewards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support ReasoningGym #147

TODOs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support ReasoningGym #147

Description

TODOs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions