Lots of people hit training failures after 100 steps especially in multi-turn agentic RL. For example https://github.com/0russwest0/Agent-R1/issues/30#issuecomment-2998632482 This kind of problem is very difficult to debug due to lacking tools. The idea in this issue is to log input\output from LLM and tool calls into external tracking system such as wandb.