This is an official repo for paper "Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning", ICCV 2025. [paper]
-
Prepare the environment:
conda create -n chat-3d-v2 python=3.9.17 conda activate chat-3d-v2 conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=11.8 -c pytorch -c nvidia pip install -r requirements.txt
-
Download LLM backbone:
-
We use Vicuna-7B v1.5 in our experiments, which can be downloaded from Hugging Face.
-
Change the
llama_model_pathin config.py to the location ofvicuna-7b-v1.5.
-
-
Annotations and extracted features:
You can download them from Google Drive. Please check the
scripts/config.pyfor a better understanding of the files on Google Drive.The detailed instructions are in Chat-Scene's Preparation.
-
Training
-
Stage1:
bash scripts/run_stage1.shExplanation of "train_tag" and "val_tag"
-
Use
#to seperate different datasets -
Datasets:
-
Please check the script file for further explanation of the other dataset.
-
-
Stage2:
bash scripts/run_stage2.sh
For each stage, we set the epoch as 3 but we manually stop the training after 2 epochs.
-
-
Evaluate
bash scripts/eval.shWe provide our checkpoint in Google Drive.
If you find this project useful in your research, please consider cite:
@article{kang2024robin3d,
title={Robin3d: Improving 3d large language model via robust instruction tuning},
author={Kang, Weitai and Huang, Haifeng and Shang, Yuzhang and Shah, Mubarak and Yan, Yan},
journal={arXiv preprint arXiv:2410.00255},
year={2024}
}Thanks to the open source of Chat-Scene!