SemNav is a visual semantic navigation model ready to be deployed into any robot. It achieves successful object goal navigations using mainly semantic segmentation information.
In this repository we release the SemNav dataset, code and trained models detailed in our [paper].
If you use any content of this repo for your work, please cite the following bib entry:
@article{semnav,
author={Flor-Rodr{\'i}guez, Rafael and Guti{\'e}rrez-{\'A}lvarez, Carlos and Acevedo-Rodr{\'i}guez, Francisco~J. and Lafuente-Arroyo, Sergio and L{\'o}pez-Sastre, Roberto~J.},
title={SEMNAV: A Semantic Segmentation-Driven Approach to Visual Semantic Navigation},
journal={ArXiv},
year={2025},
month={June},
day={02},
doi={10.48550/arXiv.2506.01418},
url={https://doi.org/10.48550/arXiv.2506.01418}
}
To run our code you need a machine that runs Ubuntu in order to install all the dependencies. We have tested our code on Ubuntu 20.04, 22.04 and 24.04. The most easy way is to install miniconda (if you don't already have it). You can download it from here.
Once you have installed miniconda, you can setup the environment by running the following script we prepared:
bash scripts/setup_environment.shIf you want to install the dependencies manually, you can follow the instructions below.
Manual installation (Click to expand/collapse)
Clone the repository and set up the environment:
git clone https://github.com/gramuah/semnav.git
conda create -n semnav python=3.9 cmake=3.18.0
conda activate semnavgit clone --depth 1 --branch v0.2.2 https://github.com/facebookresearch/habitat-sim.git
cd habitat-sim/
pip install -r requirements.txt
python setup.py install --headless
cd ..
pip3 install torch torchvision torchaudiopip install gym==0.22.0 urllib3==1.25.11 numpy==1.25.0 pillow==9.2.0
git clone https://github.com/carlosgual/habitat-lab.git
cd habitat-lab/
python setup.py develop --install
cd ..pip install wandb
conda install protobufpip insatll -e .We provide two datasets, SemNav 40 and SemNav 1630, for leveraging semantic segmentation information:
- SemNav 1630: Built using human-annotated semantic labels from HM3D Semantics.
- SemNav 40: Derived by mapping these annotations to the 40 categories of NYUv2.
| Dataset | Download Link |
|---|---|
| SemNav 40 | Download |
| SemNav 1630 | Download |
Additionally, download the ObjectNav HM3D episode dataset from this link.
If you want to run the code in a Docker container (for example to run it into a compute server as we do), follow the instructions below. You will need a docker installation with GPU support. We also use rootless containers, which means that the container shares the same user as the host. That is why first of all you need to put your user name and user id in the Dockerfile (lines 42-43). You can get your user id by running id -u.
docker build -t semnav:latest -f docker/Dockerfile .This builds the Docker image with the entrypoint prepared to run the training script. You can modify the entrypoint to run other scripts, for example the evaluation script, but you will need to rebuild the image.
docker run \
-v /home/your_username/local_path_to_your_data/:/home/your_username/code/data \ # mount the data folder
-v /home/your_username/local_path_to_your_code/semnav:/home/your_username/code \ # mount the code folder (so you can modify the code locally and still deploy it via docker)
--env NVIDIA_VISIBLE_DEVICES=5,6 \ # If you want to use specific GPUs on multi-GPU systems
--env WANDB_API_KEY=your_api_key \ # If you want to use wandb, if not ignore
--name semnav_container \
--runtime=nvidia \
semnavdocker exec -it semnav_container /bin/bashdocker stop semnav_containerThe Dockerfile sets up the complete environment, including:
- CUDA and cuDNN for GPU support
- Conda for environment management
- Habitat-Sim and Habitat-Lab for simulation tasks
- Essential Python libraries: PyTorch, torchvision, torchaudio
Ensure the entry script entrypoint.sh is executable.
We provide multiple trained configurations. We provide checkpoints for the SemNav 40 dataset in three setups:
- RGBS (IL): Trained using imitation learning (IL) with RGB and semantic segmentation inputs.
- RGBS (IL+RL): Trained using a combination of imitation learning (IL) and reinforcement learning (RL) with RGB and semantic segmentation inputs.
- OS: Trained using only semantic segmentation inputs.
| Configuration | Description | Download Link |
|---|---|---|
| RGBS (IL) | RGB + Semantic Segmentation (IL) | Download |
| RGBS (IL+RL) | RGB + Semantic Segmentation (IL + RL) | Download |
| OS | Only Semantic Segmentation | Download |
To train a model from scratch, run (with the conda environment activated):
bash scripts/launch_training.shThe training dataset is available in the PirlNav repository.
Modify the training configuration in:
configs/experiments/il_objectnav.yaml
SEMANTIC_ObjectNavILMAEPolicy: Uses only semantic segmentation.SEMANTIC_RGB_ObjectNavILMAEPolicy: Uses both semantic segmentation and RGB.RGB_ObjectNavILMAEPolicy: Uses only RGB.
Pretrained visual encoder weights can be downloaded from the PirlNav repository.
| Parameter | Value |
|---|---|
| Number of GPUs | 8 |
| Number of environments per GPU | 16 |
| Rollout length | 64 |
| Number of mini-batches per epoch | 2 |
| Optimizer | Adam |
| Learning rate scheduler | Cyclic LR (exp_range) |
| Base learning rate | 1×10⁻⁵ |
| Maximum learning rate | 1×10⁻³ |
| Step size up | 2000 |
| Exponential decay factor (γ) | 0.99994 |
| DDPIL sync fraction | 0.6 |
Run the evaluation with (with the conda environment activated):
bash scripts/launch_eval.shTo evaluate pretrained models, select a checkpoint from pretrained_ckpt.
For further information, refer to our paper or visit our Group Page.

