VLN-CE on AI2THOR: Advancing Vision-and-Language Navigation

Welcome to the implementation of Vision-and-Language Navigation under Continuous Environment (VLN-CE) models on the AI2THOR dataset. This project aims to push the boundaries of embodied AI by integrating the latest advancements and novelties in the field, leveraging state-of-the-art object detection, and developing robust navigation strategies.

🚀 Introduction

Vision-and-Language Navigation (VLN) tasks challenge agents to interpret natural language instructions and navigate complex, photorealistic environments. This repository focuses on:

Implementing VLN-CE models in the AI2THOR simulation environment based on manually interactive collection via keyboard and navigating through the environment to collect data or automatically generated data.
Integrating real-time object detection and scene understanding using YOLOv5s.
Exploring and optimizing advanced navigation strategies.

✨ Features

Real-time Object Detection: Utilizes pretrained models (e.g., YOLOv5s) for fast and accurate object recognition.
Custom Navigation Strategies: Implements and compares Greedy Lookahead, Lin-Kernighan, Dynamic Programming, and A* algorithms, all informed by detected objects.
Action Sequence Optimization: Seeks to minimize the number of actions required to reach the destination.
Modular Dataset Handling: Supports custom dataset generation and fine-tuning.
Extensible Framework: Designed for easy integration of new models, strategies, and research ideas.

Sample output: Real-time object detection and labeling in AI2THOR scenes using YOLOv5s.

🛠️ Technologies & Libraries

AI2THOR: Interactive 3D environment for embodied AI research.
YOLOv5s: Real-time object detection.
PyTorch: Deep learning framework.
OpenCV: Image processing and computer vision.
Pynput: Keyboard control for manual navigation.
Tkinter: GUI for object selection.
[NumPy, Pandas, Matplotlib, Seaborn]: Data handling and visualization.

🧭 Navigation Strategies

Greedy Lookahead: Selects the next action based on immediate reward and detected objects.
Lin-Kernighan Heuristic: Applies advanced local search for path optimization.
Dynamic Programming: Finds optimal action sequences by breaking down navigation into subproblems.
A*: Utilizes heuristic search to efficiently reach the goal.
Customizable: Easily add or modify navigation algorithms.

📦 Dataset

AI2THOR Scenes: Rich, interactive environments for training and evaluation.
Custom Dataset Generation: Scripts for collecting RGB, depth, segmentation, and metadata.
YOLOv5 Labeling: Automated label generation for object detection fine-tuning.

🚀 Getting Started

Clone the Repository

git clone https://github.com/yourusername/VLN-CE_pro.git
cd VLN-CE_pro

Set Up the Environment

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Download Pretrained Weights
- Place YOLOv5s weights in the appropriate directory (see yolov5/).

Run Manual Navigation & Data Collection

python yolov5/src/scripts/manual_navigator.py

Fine-tune YOLOv5s on Custom Data
```
python yolov5/src/fine-tune_yolov5s.py
```
Run Detection or Navigation Experiments
- See scripts in yolov5/src/ for detection and navigation.

🗂️ Project Structure

VLN-CE_pro/
├── yolov5/
│   ├── src/
│   │   ├── scripts/
│   │   │   └── manual_navigator.py
│   │   ├── custom_detect.py
│   │   └── fine-tune_yolov5s.py
│   ├── runs/
│   ├── models/
│   └── ...
├── notes/
├── requirements.txt
└── README.md

🔮 Future Work & Novelties

Transformer-based VLN Models: Integrate recent advances like VLN-BERT, EnvDrop, and HAMT.
Vision-Language Pretraining: Leverage large-scale pretrained models (e.g., CLIP, BLIP) for improved grounding.
Curriculum Learning: Gradually increase task difficulty for more robust agents.
Uncertainty Estimation: Incorporate Bayesian methods for safer navigation.
Multi-modal Fusion: Combine audio, depth, and semantic maps for richer perception.
Reinforcement Learning Enhancements: Explore curiosity-driven exploration and hierarchical RL.
Sim2Real Transfer: Bridge the gap between simulation and real-world deployment.
Mind the Error!: Detection and Localization of Instruction Errors in Vision-and-Language Navigation. Ref

🙏 Acknowledgements

AI2THOR team for the simulation environment.
Ultralytics YOLOv5 for object detection.
Open-source contributors and the embodied AI research community.

Contact: For questions or collaborations, please open an issue or reach out via Telegram.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
rust-object-detection-server		rust-object-detection-server
yolov5		yolov5
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
sample-object-detection.png		sample-object-detection.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VLN-CE on AI2THOR: Advancing Vision-and-Language Navigation

🚀 Introduction

✨ Features

🛠️ Technologies & Libraries

🧭 Navigation Strategies

📦 Dataset

🚀 Getting Started

🗂️ Project Structure

🔮 Future Work & Novelties

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

lyteabovenyte/Interactive-VLN_CE

Folders and files

Latest commit

History

Repository files navigation

VLN-CE on AI2THOR: Advancing Vision-and-Language Navigation

🚀 Introduction

✨ Features

🛠️ Technologies & Libraries

🧭 Navigation Strategies

📦 Dataset

🚀 Getting Started

🗂️ Project Structure

🔮 Future Work & Novelties

🙏 Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages