Great work on the paper!
We noticed that the paper mentions several models (like Qwen2.5-Math-7B-base model) that have undergone several RL training methods. These models seem to show promising performance, and it would be highly beneficial for the research community to have access to them. Would it be possible to open-source the weights of these RL-trained models to facilitate further research and reproducibility?