SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

	prompt navigate to a basketball	prompt find to a basketball	prompt locate a vase.	prompt find a spray bottle and pick up that spray bottle
Baseline
SafeVLA

Several demos demonstrate how SafeVLA can ensure safety while optimizing task performance.

Latest Updates

[2025-09-18] Paper accepted: SafeVLA was accept as Neurips 2025 Spotlight!
[2025-03-06] Paper released: SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
[2025-02-28] Initial release

Quick Start

Setting up the Python environment

Please use the pre-built image from Docker Hub:

docker pull safevla/safevla:v0

Then

export CODE_PATH=/path/to/this/repo
export DATA_PATH=/path/to/training_data
export DOCKER_IMAGE=safevla/safevla:v0
docker run \
    --gpus all \
    --device /dev/dri \
    --mount type=bind,source=${CODE_PATH},target=/root/spoc \
    --mount type=bind,source=${DATA_PATH},target=/root/data \
    --shm-size 50G \
    --runtime=nvidia \
    -it ${DOCKER_IMAGE}:latest

and use the following conda environment:

conda activate spoc

The Safety-CHORES task we proposed has been integrated into Safety-Gymnasium Then please clone Safety-gymnasium and install it:

git clone https://github.com/PKU-Alignment/safety-gymnasium.git
cd safety-gymnasium
pip install -e .

Training

In order to run training and evaluation you'll need:

The processed/optimized Objaverse assets along with their annotations.

The set of ProcTHOR-Objaverse houses you'd like to train/evaluate on.

For evaluation only, a trained model checkpoint.

Below we describe how to download the assets, annotations, and the ProcTHOR-Objaverse houses. We also describe how you can use one of our pre-trained models to run evaluation.

Downloading assets, annotations, and houses

Downloading optimized Objaverse assets and annotations

Pick a directory /path/to/objaverse_assets where you'd like to save the assets and annotations. Then run the following commands:

python -m objathor.dataset.download_annotations --version 2023_07_28 --path /path/to/objaverse_assets
python -m objathor.dataset.download_assets --version 2023_07_28 --path /path/to/objaverse_assets

These will create the directory structure:

/path/to/objaverse_assets
    2023_07_28
        annotations.json.gz                              # The annotations for each object
        assets
            000074a334c541878360457c672b6c2e             # asset id
                000074a334c541878360457c672b6c2e.pkl.gz
                albedo.jpg
                emission.jpg
                normal.jpg
                thor_metadata.json
            ... #  39663 more asset directories

Downloading ProcTHOR-Objaverse houses

Pick a directory /path/to/objaverse_houses where you'd like to save ProcTHOR-Objaverse houses. Then run:

python -m scripts.download_objaverse_houses --save_dir /path/to/objaverse_houses --subset val

to download the validation set of houses as /path/to/objaverse_houses/val.jsonl.gz. You can also change val to train to download the training set of houses.

Setting environment variables

Next you need to set the following environment variables:

export PYTHONPATH=/path/to/code_in_docker
export OBJAVERSE_HOUSES_DIR=/path/to/objaverse_houses
export OBJAVERSE_DATA_DIR=/path/to/objaverse_assets

For training, we recommend to set two more environment variables to avoid timeout issues from AllenAct:

export ALLENACT_DEBUG=True
export ALLENACT_DEBUG_VST_TIMEOUT=2000

Running Safe RL finetuning

Download pretrained IL ckpt:

python scripts/download_il_ckpt.py --ckpt_ids spoc_IL --save_dir PATH_TO_SAVE_DIR

Run Safe RL training:

python training/online/dinov2_vits_tsfm_rgb_augment_objectnav.py train --il_ckpt_path IL_CKPT_PATH --num_train_processes NUM_OF_TRAIN_PROCESSES --output_dir PATH_TO_RESULT --dataset_dir PATH_TO_DATASET --cost_limit COST_LIMIT --tag EXP_NAME

For example,

python training/online/dinov2_vits_tsfm_rgb_augment_objectnav.py train --il_ckpt_path /root/data/il_ckpt/spoc_IL/model.ckpt --num_train_processes 32 --output_dir results --dataset_dir /root/data/data/astar/ObjectNavType --cost_limit 2.31964 --tag SafeVLA2.31964-ObjectNavType-RL-DinoV2-ViTS-TSFM

Evaluation

Downloading the trained model ckpt and evaluation results

python scripts/download_trained_ckpt.py --save_dir ckpt
cd ckpt
cat safevla_* | tar -xz

bash scripts/objnav.bash

Citation

If you find our code or models useful in your work, please cite our paper:

@article{zhang25safevla,
    title={SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning},
    author={Borong Zhang and Yuhao Zhang and Jiaming Ji and Yingshan Lei and Josef Dai and Yuanpei Chen and Yaodong Yang},
    journal = {arXiv preprint arXiv:2503.03480},
    year={2025}
}

Acknowledgment

This repository benefits from AllenAct, AI2THOR, ProcTHOR, SPOC, FLaRe and Align-Anything.

Thanks for their wonderful works and their efforts to further promote VLA research. SafeVLA and its related assets are built and open-sourced with love and respect ❤️.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
architecture		architecture
assets		assets
environment		environment
online_evaluation		online_evaluation
scripts		scripts
training		training
utils		utils
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
Arial.ttf		Arial.ttf
LICENSE		LICENSE
README.md		README.md
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
spoc_constants.py		spoc_constants.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

Latest Updates

Quick Start

Setting up the Python environment

Training

Downloading assets, annotations, and houses

Downloading optimized Objaverse assets and annotations

Downloading ProcTHOR-Objaverse houses

Setting environment variables

Running Safe RL finetuning

Evaluation

Downloading the trained model ckpt and evaluation results

Citation

Acknowledgment

About

Uh oh!

Releases

Packages

Contributors 3

Languages

Uh oh!

License

Uh oh!

PKU-Alignment/SafeVLA

Folders and files

Latest commit

History

Repository files navigation

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

Latest Updates

Quick Start

Setting up the Python environment

Training

Downloading assets, annotations, and houses

Downloading optimized Objaverse assets and annotations

Downloading ProcTHOR-Objaverse houses

Setting environment variables

Running Safe RL finetuning

Evaluation

Downloading the trained model ckpt and evaluation results

Citation

Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages