Skip to content

WeitaiKang/Robin3D

Repository files navigation

Robin3D

This is an official repo for paper "Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning", ICCV 2025. [paper]

🔨 Preparation

  • Prepare the environment:

    conda create -n chat-3d-v2 python=3.9.17
    conda activate chat-3d-v2
    conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=11.8 -c pytorch -c nvidia
    pip install -r requirements.txt
  • Download LLM backbone:

    • We use Vicuna-7B v1.5 in our experiments, which can be downloaded from Hugging Face.

    • Change the llama_model_path in config.py to the location of vicuna-7b-v1.5.

  • Annotations and extracted features:

    You can download them from Google Drive. Please check the scripts/config.py for a better understanding of the files on Google Drive.

    The detailed instructions are in Chat-Scene's Preparation.

🤖 Training and Inference

  • Training

    • Stage1:

      bash scripts/run_stage1.sh 
      
      Explanation of "train_tag" and "val_tag"
      • Use # to seperate different datasets

      • Datasets:

        • scanrefer: ScanRefer Dataset
        • scan2cap: Scan2Cap Dataset
        • scanqa: ScanQA Dataset
        • sqa3d: SQA3D Dataset
        • multi3dref: Multi3dRefer Dataset
        • nr3d_caption: A captioning dataset originated from Nr3D.
        • obj_align: A dataset originated from ScanRefer to align the object identifiers with object tokens.
      • Please check the script file for further explanation of the other dataset.

    • Stage2:

      bash scripts/run_stage2.sh 
      

    For each stage, we set the epoch as 3 but we manually stop the training after 2 epochs.

  • Evaluate

    bash scripts/eval.sh
    

    We provide our checkpoint in Google Drive.

📄 Citation

If you find this project useful in your research, please consider cite:

@article{kang2024robin3d,
  title={Robin3d: Improving 3d large language model via robust instruction tuning},
  author={Kang, Weitai and Huang, Haifeng and Shang, Yuzhang and Shah, Mubarak and Yan, Yan},
  journal={arXiv preprint arXiv:2410.00255},
  year={2024}
}

😊 Acknowledgement

Thanks to the open source of Chat-Scene!

About

[ICCV 2025] Improving 3D Large Language Model via Robust Instruction Tuning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published