- [2025.06.26] Our paper HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics has been accepted by ICCV'2025 🚀.
- [2024.08.24] ⌨️ Our short paper BREASE: Bridging Episodes and Semantics, A Novel Framework for Long-Form Video Understanding has been accepted by the EVAL-FoMo workshop at ECCV'24.
You can install the conda environment by running:
git clone https://github.com/joslefaure/HERMES.git
cd HERMES
pip install -e .
Additionally, our modules can be plugged into other VLMs for faster inference and improved memory management.
-
Download the train data (if you want to finetune HERMES) from here and the test data from here
-
Extract the frames at 10FPS and organize it as follows:
├── data
└── moviecore
├── annotation
├── frames
└── {video_id}
├── frame000001.jpg
├── ...
Dataset | Download Link |
---|---|
MovieCORE | GDrive / HuggingFace |
MovieChat-1k | GDrive / HuggingFace |
LVU | GDrive (Coming soon) |
Breakfast | GDrive (Coming soon) |
COIN | GDrive (Coming soon) |
We inference the model on 4 V100 GPUs (32GB). One GPU will do BTW.
First add your openai API to the environment variable export OPENAI_API_KEY='sk-*****
(only for moviechat dataset), as we use GPT3.5 for scoring. For the other datasets, we report top-1 accuracy.
# Zero-shot
bash run_scripts/moviecore/test.sh
# Fully-supervised
bash run_scripts/moviecore/test.sh path/to/your/model.pth
Same for the other datasets. All the scripts are included in run_scripts
.
We train the model on 8 V100 GPUs (32GB).
bash run_scripts/{dataset}/train.sh
If you find our code or our paper useful for your research, please [★star] this repo and [cite] the following paper:
@misc{faure2024hermestemporalcoherentlongformunderstanding,
title={HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics},
author={Gueter Josmy Faure and Jia-Fong Yeh and Min-Hung Chen and Hung-Ting Su and Shang-Hong Lai and Winston H. Hsu},
year={2024},
eprint={2408.17443},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.17443},
}
We thank the authors of the following repositories for open-sourcing their code.
Icon made by Freepik from www.flaticon.com