🌐 Homepage | 🤗 Dataset | 📖 Paper
- 🔥[2025-03-25]: Dataset available on HuggingFace. Paper available on arXiv.
Download tsv files from HuggingFace and store them in data/tsv/. The directory should be like data/tsv/{DATASET}_{SETTING}_{LANGUAGE}.tsv.
conda create -n pm4bench python=3.10.5
conda activate pm4bench
pip install -r requirements.txtAPI inference requires an API_KEY. Please configure the API_KEY in the .env file in the following format:
gpt-4o-2024-11-20='xxx'
step-1o-vision-32k='xxx'
qwen2.5-vl-72b-instruct='xxx'
gemini-2.0-flash-thinking-exp='xxx'
DeepSeek-R1='xxx'
gpt-4o-mini='xxx'The API_KEY will be loaded through the infer_api.py file using:
load_dotenv() # load .env file to get API_KEY
API_KEY = os.getenv(MODEL)🔴 Attention: All codes and scripts files are executed in the root directory!
e.g. python code/infer_api.py [MODEL] [MODE] [SETTING] [LANGUAGE] [TASK] [DATASET] [MAX_TOKENS]
MODEL: Official model name, such asgpt-4o-2024-11-20,qwen2.5-vl-72b-instruct, etc.MODE: For normal VLMs, usedirect; for reasoning VLMs, usecot.SETTING:traditionalorvision, for detailed explanations please refer to our paper.LANGUAGE: 10 languages choices,[ZH, EN, AR, SR, TH, RU, KO, CS, HU, VI]TASK:OCRfor OCR tasks, andVQAfor VQA tasks undertraditionalorvisionsettings.DATASET:[MDUR, MIQA, MMJB, MSOCR]MAX_TOKENS: For different models, theMAX_TOKENSshould be different in case of cut off problems.
Besides, we provide a standard script template scripts/infer_api.sh. You can modify parameters directly and run it using
nohup bash scripts/infer_api.sh > logs/infer_api.log 2>&1 &Step 0. Use LMDeploy to serve models
A special thanks to LMDeploy for their work, which has greatly assisted in providing local inference for our work. Please refer to LMDeploy docs for detailed information of VLMs' deployment and serve. Before inference, you should make sure that VLM is running and you have a local port (like 23333) to call it:
CUDA_VISIBLE_DEVICES=$CUDA_DEVICES nohup lmdeploy serve api_server $MODEL_PATH
--backend turbomind --dtype $DTYPE --server-port $SERVER_PORT --tp $TP > $LOG_PATH 2>&1 &We only provide a simplified command line here and if you want to know more paramters and their meanings, please run
lmdeploy serve api_server --help🔴 Attention: All codes and scripts files are executed in the root directory!
e.g. python code/infer_lmdeploy.py [MODEL] [MODE] [SETTING] [LANGUAGE] [TASK] [DATASET] [MAX_TOKENS] [PORT]
MODEL: Model name, such asInternVL2_5-78B-MPO,qwen2.5-vl-72b-instruct, etc.MODE: For normal VLMs, usedirect; for reasoning VLMs, usecot.SETTING:traditionalorvision, for detailed explanations please refer to our paper.LANGUAGE: 10 languages choices,[ZH, EN, AR, SR, TH, RU, KO, CS, HU, VI]TASK:OCRfor OCR tasks, andVQAfor VQA tasks undertraditionalorvisionsettings.DATASET:[MDUR, MIQA, MMJB, MSOCR]MAX_TOKENS: For different models, theMAX_TOKENSshould be different in case of cut off problems.PORT: Local port (like23333) for lmdeploy server to call.
Besides, we provide a standard script template scripts/infer_lmdeploy.sh. You can modify parameters directly and run it using
nohup bash scripts/infer_lmdeploy.sh > logs/infer_lmdeploy.log 2>&1 &We use gpt-4o-2024-11-20 to judge VQA performance so you should configure API_KEY before evaluation. Besides, you can change base model in code/eval/{DATASET}/eval_{DATASET}_vqa.py:
OPENAI_API_BASE = "https://api.openai.com/v1"
client = OpenAI(
api_key = os.getenv('gpt-4o-2024-11-20'),
base_url = OPENAI_API_BASE
)The evaluation codes are executed by:
python code/eval/{DATASET}/eval_{DATASET}_{TASK}.pywhere DATASET is chosen from [MDUR, MIQA, MMJB, MSOCR] and TASK is chosen from [VQA, OCR].
The statistics codes are executed by:
python code/score.pyand the results are stored in data/results/{DATASET}_{TASK}_{SETTING}.csv
If you find this work helpful, please consider to star🌟 this repo. Thanks for your support!
@misc{gao2025pm4benchparallelmultilingualmultimodal,
title={PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model},
author={Junyuan Gao and Jiahe Song and Jiang Wu and Runchuan Zhu and Guanlin Shen and Shasha Wang and Xingjian Wei and Haote Yang and Songyang Zhang and Weijia Li and Bin Wang and Dahua Lin and Lijun Wu and Conghui He},
year={2025},
eprint={2503.18484},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.18484},
}