Skip to content

Commit 2a57a95

Browse files
authored
add minimal example for building training datasets (#448)
1 parent b3f81a6 commit 2a57a95

File tree

1 file changed

+18
-0
lines changed

1 file changed

+18
-0
lines changed

README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,24 @@ Instructions for running the evaluation scripts are provided in [eval/README.md]
7171

7272
Evaluation results of different reasoning models will be tracked in the [reasoning-gym-eval](https://github.com/open-thought/reasoning-gym-eval) repo.
7373

74+
## 🤓 Training
75+
76+
The `training/` directory has full details of the training runs we carried out with RG for the paper. In our experiments, we utilise custom Dataset code to dynamically create RG samples at runtime, and to access the RG scoring function for use as a training reward.
77+
78+
For a more plug-and-play experience, it may be easier to build a dataset ahead of time. See `scripts/hf_dataset/` for a simple script allowing generation of RG data and conversion to a HuggingFace dataset. To use the script, build your dataset configurations in the YAML. You can find a list of tasks and configurable parameters in [the dataset gallery](GALLERY.md). Then run `save_hf_dataset.py` with desired arguments.
79+
80+
The script will save each dataset entries as a row with `question`, `answer`, and `metadata` columns. The RG scoring functions expect the entry object from each row along with the model response to obtain reward values. Calling the scoring function is therefore simple:
81+
82+
```python
83+
from reasoning_gym import get_score_answer_fn
84+
85+
for entry in dataset:
86+
model_response = generate_response(entry["question"])
87+
rg_score_fn = get_score_answer_fn(entry["metadata"]["source_dataset"])
88+
score = rg_score_fn(model_response, entry)
89+
# do something with the score...
90+
```
91+
7492
## 👷 Contributing
7593

7694
Please see [CONTRIBUTING.md](CONTRIBUTING.md).

0 commit comments

Comments
 (0)