This repository contains the code for our ICCV 2025 paper ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
.
We test our codebase with PyTorch 2.0.1. Please install the corresponding PyTorch and CUDA versions according to your computational resources.
conda create -n ONLY python=3.10
conda activate ONLY
git clone https://github.com/zifuwan/ONLY.git
cd ONLY
pip install -r requirements.txt
python -m pip install -e transformers
Please also download the model checkpoints:
- LLaVA-1.5: Download LLaVA-1.5 merged 7B
- InstructBLIP: Download InstructBLIP
- Qwen-VL-Chat: Download Qwen-VL-Chat
As for the datasets and benchmarks:
We provide the code for evaluating our ONLY on POPE, CHAIR, and MME-Hallucination benchmark. You can simply run the following code to run the experiments:
- POPE:
bash eval_bench/scripts/pope_eval.sh
- CHAIR:
bash eval_bench/scripts/chair_eval.sh
- MME:
bash experiments/cd_scripts/mme_eval.sh
Our codebase is adapted from RITUAL, VCD, OPERA, LLaVA, and DeGF. We thank the authors for releasing their code!
If you have any questions, please contact [email protected].
If you find this code useful, please consider citing our work:
@article{wan2025only,
title={ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models},
author={Wan, Zifu and Zhang, Ce and Yong, Silong and Ma, Martin Q and Stepputtis, Simon and Morency, Louis-Philippe and Ramanan, Deva and Sycara, Katia and Xie, Yaqi},
journal={arXiv preprint arXiv:2507.00898},
year={2025}
}