Skip to content

Conversation

EigenTom
Copy link
Collaborator

@EigenTom EigenTom commented Jun 27, 2025

This PR aims to merge the codes which:

  1. Integrates Search-R1 Training and tool calling capability with verl-tool.
  2. Provide scripts and instructions for reproducing the main experiments described in the Search-R1 paper.

Progress:

  • Integrate Search-R1 reward manager and tool calling logic
  • Enabling Async model training
  • Perform Search-R1 training with the same hyperparameters, training data, and retriever
  • Implement one-step model evaluation code for reproduction verification
  • Code refactoring for distangling search-r1 logic with general services
  • Incorporate Search-R1's retriever into verl-tool

Current Training curve:

Currently we utilize the refined retriever in sglang's port in verl.

Now the model (Qwen2.5-3B) training can already achieve significantly faster convergence than their official port.

Top: training log reported in sglang's port

The link to this wandb log
image

Above: sglang's results, only achieve ~35% accuracy in critics/score/mean at step50.
Below: our results. achieved ~40% accuracy in critics/score/mean before step 40.

Note:

  1. As stated in sglang's port, search-r1's official off-line retriever has lower precision and would result in suboptimal training results.
  2. To achieve comparable or even superior performance, we need to extend max_response_length from 500 to 2560, and max_obs_length from 512 to 1024. This allow the model to get full retriever results and enable longer thinking process, potentially allow the model to perform multi-round tool calls.

@jdf-prog jdf-prog merged commit cf4bd14 into main Jun 30, 2025
ShenzheZhu pushed a commit to ShenzheZhu/verl-tool that referenced this pull request Sep 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants