🙋 Please let us know if you find out a mistake or have any suggestions!
🌟 If you find this resource helpful, please consider to star this repository and cite our research!
- 2025-10-03: 📢 Our paper "RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning" is now available on arXiv!
- 2025-10-01: 🚀 We released
RewardMap
and the corresponding ReasonMap-Plus!
If you face any issues with the installation, please feel free to open an issue. We will try our best to help you.
pip install -r requirements.txt
You can download ReasonMap-Plus for evaluation and ReasonMap-Train for Rewardap Training from HuggingFace or by running the following command:
python utils/download_dataset.py
Then, put the data under the folder data
.
You can train the model by running the following command:
# RewardMap training
bash scripts/reward_map.sh
Then, you can merge the trained model by running:
# merge trained model
bash scripts/merge_model.sh
We use LLaMA-Factory to conduct SFT training. Please first put the file sft.yaml
under the folder examples/train_full
of LLaMA-Factory
repo and prepare the datasets by running the following command:
python utils/prepare_data_for_sft.py --dataset_dir path/to/your_data
Your data will be transferred into the format like:
{
"conversations": [
{
"from": "human",
"value": "<image> Please solve the multiple choice problem and put your answer (one of ABCD) in one \"\\boxed{}\". According to the subway map, how many intermediate stops are there between Danube Station and lbn Battuta Station (except for this two stops)? \nA) 8 \nB) 1 \nC) 25 \nD) 12 \n"
},
{
"from": "gpt",
"value": "B"
}
],
"images": [
"./maps/united_arab_emirates/dubai.png"
]
},
Then, add the data information in the file LLaMA-Factory/data/dataset_info.json
:
"reasonmap_plus": {
"file_name": "reason_map_plus.json",
"formatting": "sharegpt",
"ranking": false,
"columns": {
"messages": "conversations",
"images": "images"
}
}
Then run the following command under the LLaMA-Factory
repo:
# SFT training
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/reason-map-plus.yaml
You can evaluate the model performance on ReasonMap
or ReasonMap-Plus
by following the guideline in ReasonMap.
We use VLMEvalKit to evaluate our models on other benchmarks, to conduct evaluation, you should first add the model information in VLMEvalKit/vlmeval/config.py
:
"your-model-name": partial(
Qwen2VLChat,
model_path="path/to/your_model",
min_pixels=1280 * 28 * 28,
max_pixels=16384 * 28 * 28,
use_custom_prompt=False,
),
Then run the following command under the VLMEvalKit
repo:
# evaluate on other benchmarks
bash script/eval_other_benchmarks.sh
This source code is derived from the PyTorch reimplementation of Seg-Zero.
If you find this paper useful in your research, please consider citing our paper:
@article{feng2025rewardmap,
title={RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning},
author={Feng, Sicheng and Tuo, Kaiwen and Wang, Song and Kong, Lingdong and Zhu, Jianke and Wang, Huan},
journal={arXiv preprint arXiv:2510.02240},
year={2025}
}