Align²LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

Hongzhe Huang¹, Jiang Liu¹, Zhewen Yu¹, Li Cai¹, Dian Jiao¹, Wenqiao Zhang^1†, Siliang Tang¹,

Juncheng Li¹, Hao Jiang², Haoyuan Li², Yueting Zhuang¹

¹Zhejiang University, ²Alibaba

^†Corresponding Authors

Overview

Align²LLaVA is a novel instruction curation algorithm, derived from two unique perspectives, human and LLM preference alignment, to compress the vast corpus of machine-generated multimodal instructions to a compact and high-quality form.

In this repository, we provide implementation for our proposed reward models in the human knowledge alignment, and LLaVA-1.5 instruction tuning.

Intallation

Clone this repository and enter the root directory.

git clone https://github.com/DCDmllm/Align2LLaVA.git
cd Align2LLaVA

Clone the LLaVA repository, and install the environment for LLaVA-1.5 instruction tuning.

git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for training cases.

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Clone the LLaVA-1.5 environment to prepare a new one for reward model.

conda create -n align2llava_rm --clone llava
conda activate align2llava_rm
pip uninstall llava  # reward models and LLaVA-1.5 use different code base

Reward Model

The implementation of our reward model is in the reward_model directory. See reward model for details.

Dataset

Todo

LLaVA-1.5 Instruction Tuning

We directly fine-tune LLaVA-1.5 on our aligned instructions without any changes to the official code base. To start training, specify the data path and run the script:

cd LLaVA
bash ./scripts/v1_5/finetune_lora.sh

To evaluate the fine-tuned model, see the official evaluation document for details.

Referencing and Citing

If you find this work useful, please consider giving this repository a star and citing our paper as follows:

@misc{huang2024align2llavacascadedhumanlarge,
      title={Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation}, 
      author={Hongzhe Huang and Jiang Liu and Zhewen Yu and Li Cai and Dian Jiao and Wenqiao Zhang and Siliang Tang and Juncheng Li and Hao Jiang and Haoyuan Li and Yueting Zhuang},
      year={2024},
      eprint={2409.18541},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2409.18541}, 
}

Acknowledgment

Our project is developed based on the following repositories:

LLaVA: Large Language and Vision Assistant
CogVLM: Visual Expert for Pretrained Language Models

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
reward_model		reward_model
LICENSE		LICENSE
README.md		README.md
method.png		method.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Align²LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

Overview

Intallation

Reward Model

Dataset

LLaVA-1.5 Instruction Tuning

Referencing and Citing

Acknowledgment

About

Uh oh!

Releases

Packages

Languages

License

DCDmllm/Align2LLaVA

Folders and files

Latest commit

History

Repository files navigation

Align2LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

Overview

Intallation

Reward Model

Dataset

LLaVA-1.5 Instruction Tuning

Referencing and Citing

Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Align²LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

Packages