The evaluation codes

Hi,

I used the [Qwen-2.5-Math codebase](https://github.com/QwenLM/Qwen2.5-Math/tree/main/evaluation) to evaluate the Qwen-2.5-Math-7B model on Math datasets, but I only achieved a score of 32.23. Could you kindly share the evaluation scripts you used in your experiments?

Thank you in advance!