[WIP] Search-R1 Adaptation & Reproduction #71

EigenTom · 2025-06-27T16:13:09Z

This PR aims to merge the codes which:

Integrates Search-R1 Training and tool calling capability with verl-tool.
Provide scripts and instructions for reproducing the main experiments described in the Search-R1 paper.

Progress:

Integrate Search-R1 reward manager and tool calling logic
Enabling Async model training
Perform Search-R1 training with the same hyperparameters, training data, and retriever
Implement one-step model evaluation code for reproduction verification
Code refactoring for distangling search-r1 logic with general services
Incorporate Search-R1's retriever into verl-tool

Current Training curve:

Currently we utilize the refined retriever in sglang's port in verl.

Now the model (Qwen2.5-3B) training can already achieve significantly faster convergence than their official port.

Top: training log reported in sglang's port

The link to this wandb log

Above: sglang's results, only achieve ~35% accuracy in critics/score/mean at step50.
Below: our results. achieved ~40% accuracy in critics/score/mean before step 40.

Note:

As stated in sglang's port, search-r1's official off-line retriever has lower precision and would result in suboptimal training results.
To achieve comparable or even superior performance, we need to extend max_response_length from 500 to 2560, and max_obs_length from 512 to 1024. This allow the model to get full retriever results and enable longer thinking process, potentially allow the model to perform multi-round tool calls.

…verl-tool into luyi-0625-searchr1

[WIP] Search-R1 Adaptation & Reproduction

EigenTom and others added 15 commits June 25, 2025 04:09

feat: implemented preliminary search-r1 port

109fa37

feat: fixed training script

79fa4a2

feat: decoupled hierarchical tag-matching logic from serve.py

162e07a

chore: training scripts for search-r1 reproduction

28f8a2c

fix: increased len(context/return) in train sh

ec60507

chore: added README.md for search-r1 reproduction

bc5117a

chore: updated eval results at 160 step

b7b6e26

chore: updated performance comparison table format

3b5ea78

chore: added record at 200 step

f29c2ce

chore: added reimplementation's training tensorboard report

b7bfe1e

chore: added more instruction in README.md

26fc859

Update README.md

6b81878

update the done logic to be insider the search tool only

e1106f3

Merge branch 'luyi-0625-searchr1' of https://github.com/TIGER-AI-Lab/…

ee99934

…verl-tool into luyi-0625-searchr1

Merge branch 'main' into luyi-0625-searchr1

c148e2d

jdf-prog merged commit cf4bd14 into main Jun 30, 2025

ShenzheZhu pushed a commit to ShenzheZhu/verl-tool that referenced this pull request Sep 15, 2025

Merge pull request TIGER-AI-Lab#71 from TIGER-AI-Lab/luyi-0625-searchr1

e5508db

[WIP] Search-R1 Adaptation & Reproduction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Search-R1 Adaptation & Reproduction #71

[WIP] Search-R1 Adaptation & Reproduction #71

Uh oh!

EigenTom commented Jun 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

[WIP] Search-R1 Adaptation & Reproduction #71

[WIP] Search-R1 Adaptation & Reproduction #71

Uh oh!

Conversation

EigenTom commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR aims to merge the codes which:

Progress:

Current Training curve:

Uh oh!

Uh oh!

EigenTom commented Jun 27, 2025 •

edited

Loading