Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

Kai Zou*, Ziqi Huang*, Yuhao Dong*, Shulin Tian, Dian Zheng, Hongbo Liu, Jingwen He, Bin Liu⁺, Yu Qiao⁺, Ziwei Liu⁺

* equal contributions + corresponding authors

📣 Overview

Unified multimodal models aim to jointly enable visual understanding and generation, yet current benchmarks rarely examine their true integration. Existing evaluations either treat the two abilities in isolation or overlook tasks that inherently couple them. To address this gap, we present Uni-MMMU, a comprehensive and discipline-aware benchmark that systematically unfolds the bidirectional synergy between generation and understanding across eight reasoning-centric domains, including science, coding, mathematics, and puzzles. Each task is bidirectionally coupled, demanding models to (i) leverage conceptual understanding to guide precise visual synthesis, or (ii) utilize generation as a cognitive scaffold for analytical reasoning. Uni-MMMU incorporates verifiable intermediate reasoning steps, unique ground truths, and a reproducible scoring protocol for both textual and visual outputs. Through extensive evaluation of state-of-the-art unified, generation-only, and understanding-only models, we reveal substantial performance disparities and cross-modal dependencies, offering new insights into when and how these abilities reinforce one another, and establishing a reliable foundation for advancing unified models.

Overview of Uni-MMMU. Eight tasks are grouped into two paradigms: generation aids understanding (Maze, Sliding, Geometry, Jigsaw) and understanding guides generation (Science: Physics/Chemistry/Biology; Code Rendering). Each task reports dual-channel scores (text + image).

🔨 Installation

Clone the repository.

git clone https://github.com/Vchitect/Uni-MMMU.git
cd Uni-MMMU

Install the environment.

conda update -n base -c defaults conda
conda create -n ummmu python==3.10 -y
conda activate ummmu

pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Download the dataset.

git clone https://huggingface.co/datasets/Vchitect/Uni-MMMU-Eval
cd Uni-MMMU-Eval
tar -xvf data.tar -C /path/to/Uni-MMMU

Usage

Sampling

Please refer to ./sample_code_example for details.
All sampled data will be in ./outputs/model_name .

Evaluation

Command

python eval_ummmu.py --model_name model_to_be_eval

Note: This evaluation requires Qwen2.5-VL-72B and Qwen3-32B as evaluators. We recommend running this on a system with at least A100 80GB GPUs to ensure sufficient memory and performance.

Citation

If you find our repo useful for your research, please consider citing our paper:

@misc{zou2025unimmmumassivemultidisciplinemultimodal,
      title={Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark}, 
      author={Kai Zou and Ziqi Huang and Yuhao Dong and Shulin Tian and Dian Zheng and Hongbo Liu and Jingwen He and Bin Liu and Yu Qiao and Ziwei Liu},
      year={2025},
      eprint={2510.13759},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.13759}, 
}

Related Links

Our related projects: VBench, Awesome Evaluation of Visual Generation

@InProceedings{huang2023vbench,
    title={{VBench}: Comprehensive Benchmark Suite for Video Generative Models},
    author={Huang, Ziqi and He, Yinan and Yu, Jiashuo and Zhang, Fan and Si, Chenyang and Jiang, Yuming and Zhang, Yuanhan and Wu, Tianxing and Jin, Qingyang and Chanpaisit, Nattapol and Wang, Yaohui and Chen, Xinyuan and Wang, Limin and Lin, Dahua and Qiao, Yu and Liu, Ziwei},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2024}
}

@article{huang2024vbench++,
    title={{VBench++}: Comprehensive and Versatile Benchmark Suite for Video Generative Models},
    author={Huang, Ziqi and Zhang, Fan and Xu, Xiaojie and He, Yinan and Yu, Jiashuo and Dong, Ziyue and Ma, Qianli and Chanpaisit, Nattapol and Si, Chenyang and Jiang, Yuming and Wang, Yaohui and Chen, Xinyuan and Chen, Ying-Cong and Wang, Limin and Lin, Dahua and Qiao, Yu and Liu, Ziwei},
    journal={arXiv preprint arXiv:2411.13503},
    year={2024}
}

@article{zheng2025vbench2,
    title={{VBench-2.0}: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness},
    author={Zheng, Dian and Huang, Ziqi and Liu, Hongbo and Zou, Kai and He, Yinan and Zhang, Fan and Zhang, Yuanhan and He, Jingwen and Zheng, Wei-Shi and Qiao, Yu and Liu, Ziwei},
    journal={arXiv preprint arXiv:2503.21755},
    year={2025}
}
@InProceedings{zhang2024evaluationagent,
    title = {Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models},
    author = {Zhang, Fan and Tian, Shulin and Huang, Ziqi and Qiao, Yu and Liu, Ziwei},
    booktitle={Annual Meeting of the Association for Computational Linguistics (ACL), 2025},
    year = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
sample_code_example/gpt		sample_code_example/gpt
.gitignore		.gitignore
README.md		README.md
eval_ummmu.py		eval_ummmu.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

📣 Overview

🔨 Installation

Usage

Sampling

Evaluation

Command

Citation

Related Links

About

Uh oh!

Releases

Packages

Languages

Vchitect/Uni-MMMU

Folders and files

Latest commit

History

Repository files navigation

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

📣 Overview

🔨 Installation

Usage

Sampling

Evaluation

Command

Citation

Related Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages