GitHub - fscdc/RewardMap: [arxiv 2025] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

[Dataset] [HuggingFace Daily Paper]

🙋 Please let us know if you find out a mistake or have any suggestions!

🌟 If you find this resource helpful, please consider to star this repository and cite our research!

Updates

2025-10-03: 📢 Our paper "RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning" is now available on arXiv!
2025-10-01: 🚀 We released RewardMap and the corresponding ReasonMap-Plus!

Usage

1. Install dependencies

If you face any issues with the installation, please feel free to open an issue. We will try our best to help you.

pip install -r requirements.txt

2. Download the dataset

You can download ReasonMap-Plus for evaluation and ReasonMap-Train for Rewardap Training from HuggingFace or by running the following command:

python utils/download_dataset.py

Then, put the data under the folder data.

3. Training

You can train the model by running the following command:

# RewardMap training
bash scripts/reward_map.sh

Then, you can merge the trained model by running:

# merge trained model
bash scripts/merge_model.sh

We use LLaMA-Factory to conduct SFT training. Please first put the file sft.yaml under the folder examples/train_full of LLaMA-Factory repo and prepare the datasets by running the following command:

python utils/prepare_data_for_sft.py --dataset_dir path/to/your_data

Your data will be transferred into the format like:

  {
    "conversations": [
      {
        "from": "human",
        "value": "<image> Please solve the multiple choice problem and put your answer (one of ABCD) in one \"\\boxed{}\". According to the subway map, how many intermediate stops are there between Danube Station and lbn Battuta Station (except for this two stops)? \nA) 8 \nB) 1 \nC) 25 \nD) 12 \n"
      },
      {
        "from": "gpt",
        "value": "B"
      }
    ],
    "images": [
      "./maps/united_arab_emirates/dubai.png"
    ]
  },

Then, add the data information in the file LLaMA-Factory/data/dataset_info.json:

  "reasonmap_plus": {
    "file_name": "reason_map_plus.json",
    "formatting": "sharegpt",
    "ranking": false,
    "columns": {
      "messages": "conversations",
      "images": "images"
    }
  }

Then run the following command under the LLaMA-Factory repo:

# SFT training
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/reason-map-plus.yaml

4. Evaluation

You can evaluate the model performance on ReasonMap or ReasonMap-Plus by following the guideline in ReasonMap.

We use VLMEvalKit to evaluate our models on other benchmarks, to conduct evaluation, you should first add the model information in VLMEvalKit/vlmeval/config.py:

"your-model-name": partial(
    Qwen2VLChat,
    model_path="path/to/your_model",
    min_pixels=1280 * 28 * 28,
    max_pixels=16384 * 28 * 28,
    use_custom_prompt=False,
),

Then run the following command under the VLMEvalKit repo:

# evaluate on other benchmarks
bash script/eval_other_benchmarks.sh

Acknowledgement

This source code is derived from the PyTorch reimplementation of Seg-Zero.

Citation

If you find this paper useful in your research, please consider citing our paper:

@article{feng2025rewardmap,
  title={RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning},
  author={Feng, Sicheng and Tuo, Kaiwen and Wang, Song and Kong, Lingdong and Zhu, Jianke and Wang, Huan},
  journal={arXiv preprint arXiv:2510.02240},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
maps		maps
scripts		scripts
utils		utils
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

Updates

Usage

1. Install dependencies

2. Download the dataset

3. Training

4. Evaluation

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Languages

License

fscdc/RewardMap

Folders and files

Latest commit

History

Repository files navigation

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

Updates

Usage

1. Install dependencies

2. Download the dataset

3. Training

4. Evaluation

Acknowledgement

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages