Skip to content

DCDmllm/Align2LLaVA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Align2LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

Hongzhe Huang1, Jiang Liu1, Zhewen Yu1, Li Cai1, Dian Jiao1, Wenqiao Zhang1†, Siliang Tang1,

Juncheng Li1, Hao Jiang2, Haoyuan Li2, Yueting Zhuang1

1Zhejiang University, 2Alibaba

Corresponding Authors

Overview

Align2LLaVA is a novel instruction curation algorithm, derived from two unique perspectives, human and LLM preference alignment, to compress the vast corpus of machine-generated multimodal instructions to a compact and high-quality form.

In this repository, we provide implementation for our proposed reward models in the human knowledge alignment, and LLaVA-1.5 instruction tuning.

Intallation

  1. Clone this repository and enter the root directory.

    git clone https://github.com/DCDmllm/Align2LLaVA.git
    cd Align2LLaVA
    
  2. Clone the LLaVA repository, and install the environment for LLaVA-1.5 instruction tuning.

    git clone https://github.com/haotian-liu/LLaVA.git
    cd LLaVA
    conda create -n llava python=3.10 -y
    conda activate llava
    pip install --upgrade pip  # enable PEP 660 support
    pip install -e .
    
  3. Install additional packages for training cases.

    pip install -e ".[train]"
    pip install flash-attn --no-build-isolation
    
  4. Clone the LLaVA-1.5 environment to prepare a new one for reward model.

    conda create -n align2llava_rm --clone llava
    conda activate align2llava_rm
    pip uninstall llava  # reward models and LLaVA-1.5 use different code base
    

Reward Model

The implementation of our reward model is in the reward_model directory. See reward model for details.

Dataset

Todo

LLaVA-1.5 Instruction Tuning

We directly fine-tune LLaVA-1.5 on our aligned instructions without any changes to the official code base. To start training, specify the data path and run the script:

cd LLaVA
bash ./scripts/v1_5/finetune_lora.sh

To evaluate the fine-tuned model, see the official evaluation document for details.

Referencing and Citing

If you find this work useful, please consider giving this repository a star and citing our paper as follows:

@misc{huang2024align2llavacascadedhumanlarge,
      title={Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation}, 
      author={Hongzhe Huang and Jiang Liu and Zhewen Yu and Li Cai and Dian Jiao and Wenqiao Zhang and Siliang Tang and Juncheng Li and Hao Jiang and Haoyuan Li and Yueting Zhuang},
      year={2024},
      eprint={2409.18541},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2409.18541}, 
}

Acknowledgment

Our project is developed based on the following repositories:

  • LLaVA: Large Language and Vision Assistant
  • CogVLM: Visual Expert for Pretrained Language Models

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published