Skip to content

Support ReasoningGym #147

@tyler-griggs

Description

@tyler-griggs

The reasoning gym repo (https://github.com/open-thought/reasoning-gym) contains a large set of verifiable reasoning environments for training. This issue tracks support for training in ReasoningGym environments on top of SkyRL.

I expect this integration to largely require:

  1. Handling the dataset format. ReasoningGym uses procedurally generated datasets (e.g., generated game maps), whereas SkyRL currently expects a complete dataset handed to the training stack at the start of training.
  2. Handling scoring of the completed generations. ReasoningGym already provides methods to score the model's output on some task, and this method should be used to compute rewards.

Reasoning Gym has example integrations with other training stacks that will serve as useful references.

TODOs

  • Create a ReasoningGymDataset class that formats the procedurally generated ReasoningGym datasets into the format that SkyRL supports. See the ReasoningGym+Verl integration example for reference, and see SkyRL's dataset format docs.
  • Use the scoring methods provided by ReasoningGym to compute the rewards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions