INVITATION: A Framework for Enhancing UAV Image Semantic Segmentation Accuracy through Depth Information Fusion
English | 简体中文
Official implementation of INVITATION, a novel framework for UAV image semantic segmentation through depth fusion, published in IEEE GRSL.
INVITATION exclusively takes original UAV imagery as input, yet is capable of obtaining complemented depth information and fusing into RGB semantic segmentation models effectively, thereby enhancing UAV semantic segmentation accuracy. Concretely, this framework supports two distinct depth generation approaches:
- Multi-View Stereo (MVS): High-precision depth reconstruction from UAV video sequences or multiple-view UAV images
- Monocular Depth Estimation: Depth prediction from single images via pretrained models using individual images
Key Results on UAVid Dataset:
Method | mIoU (%) | Improvement |
---|---|---|
Baseline (RGB) | 66.02 | - |
+ MVS Depth | 70.57 | ↑ 4.55 |
+ Monocular Depth | 69.69 | ↑ 3.67 |
🧩 *Figure 1: Architecture of INVITATION Framework.*
🧩 *Figure 2: Comparison of semantic segmentation results on UAVid dataset.*
- Python 3.7+
- pytorch
- gdal
- numpy
- opencv ...
-
Clone repository:
git clone https://github.com/CVEO/INVITATION.git cd INVITATION
-
Dataset Preparation:
Download UAVid Dataset and organize as:/data/ └── uavid/ ├── Depth/ # Depth maps (MVS or monocular depth estimation) ├── RGB/ # original UAV images └── Label/ # Segmentation labels
-
Training:
First, configure settings inconfig.py
Then, start training the network usingpython train.py
INVITATION
├── configs.py # Training configurations
├── /dataloader/ # UAVid data loader and data augmentation
│ ├── dataloader.py
│ └── UAVDataset.py
├── /models/ # Model architectures
│ ├── attention.py
│ ├── encoder_decoder.py
│ └── builder.py
├── /utils/ # Utility scripts
│ ├── loss.py
│ ├── visualize.py
│ └── ...
├── /outputs/ # Training logs & checkpoints
└── README.md # Documentation
If you like to use our work, please consider citing:
bibtex
@ARTICLE{10858079,
author={Zhang, Xiaodong and Zhou, Wenlin and Chen, Guanzhou and Wang, Jiaqi and Yang, Qingyuan and Tan, Xiaoliang and Wang, Tong and Chen, Yifei},
journal={IEEE Geoscience and Remote Sensing Letters},
title={INVITATION: A Framework for Enhancing UAV Image Semantic Segmentation Accuracy through Depth Information Fusion},
year={2025},
volume={},
number={},
pages={1-1},
keywords={Autonomous aerial vehicles;Semantic segmentation;Feature extraction;Training;Decoding;Accuracy;Depth measurement;Semantics;Data models;Vectors;Depth Information Fusion;Unmanned Aerial Vehicles (UAVs);Semantic Segmentation;Cross-modal Feature Enhancement;Vision Transformers (ViTs)},
doi={10.1109/LGRS.2025.3534994}}
This project is released under the Non-Commercial Academic License. For commercial use, please contact the authors.
-
UAVid Dataset: https://uavid.nl/
-
MVS Implementation: COLMAP (https://colmap.github.io/)
-
Monocular Depth Estimation:
- Monodepth2 (https://github.com/nianticlabs/monodepth2)
- ZeoDepth (https://github.com/isl-org/ZoeDepth)
- DepthAnything (https://github.com/LiheYoung/Depth-Anything)
-
Base Segmentation Code: https://github.com/huaaaliu/RGBX_Semantic_Segmentation