This repository is for the paper End-to-End Open-Vocabulary Video Visual Relationship Detection using Multi-modal Prompting (EOV-MMP). The provided link contains the datasets and models obtained from the first three training steps. The complete training and testing code will be released after the paper is officially published.
Make the end to end model ready for inference (Before Oct 30, 2025. I've been extremely busy with my internship and autumn recruitment recently, so I really don't have time to organize the code. If you do have a code requirement, please send an email to [email protected].)
Prepare the object detection part for training.
Get the relationship detection part ready for training.
Prepare for end-to-end training.