Many errors occur during the training process

Hi,
Thanks for your excellent work. During training I encountered lots of errors and some garbled output in the samples— is this expected? On 8 A100 GPUs it shows that training 444 steps will take 40 hours, roughly 5 minutes per step. Is that a normal training speed?

[Qwen2.5-Math-1.5B-raft-plusplus-numina_math-n4.log](https://github.com/user-attachments/files/20994986/Qwen2.5-Math-1.5B-raft-plusplus-numina_math-n4.log)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Many errors occur during the training process #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Many errors occur during the training process #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions