Skip to content

Official implementation of Pattern Recognition 2025 paper 《Exp-VQA: Fine-grained Facial Expression Analysis via Visual Question Answering》

Notifications You must be signed in to change notification settings

Yujianyuan/Exp-VQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Official PyTorch Implementation of Exp-VQA (Pattern Recognition 2025).

[Exp-VQA: Fine-grained Facial Expression Analysis via Visual Question Answering]
Yujian Yuan, Jiabei Zeng, Shiguang Shan
Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences

📰 News

[2025.4.28] Exp-VQA is accepted by Pattern Recognition 2025 (IF: 7.5) ! 🎉
[2024.11.24] Training and test codes of Exp-VQA are available.
[2024.11.24] Sythesized VQA pairs used for training are available.
[2024.10.24] Code and trained models will be released here. Welcome to watch this repository for the latest updates.
[2024.10.24] This work is an extension of our preliminary work Exp-BLIP.

⬇️ Data and Models Download

(1) Sythesized VQA pairs

VQA type Link
Global facial expression captioning (Q1) OneDrive
Local facail actions captioning (Q2) OneDrive
Single AU detection (Q3) OneDrive

(2) Trained models

Model Link
Exp-VQA OneDrive
Exp-VQA(fz) OneDrive

🔨 Installation

  1. (Optional) Creating conda environment
conda create -n expvqa python=3.8.12
conda activate expvqa
  1. Download the packages in requirements.txt
pip install -r requirements.txt 
  1. Download this repo.
git clone https://github.com/Yujianyuan/Exp-VQA.git
cd Exp-VQA

🚀 Getting started

(1) Training

You should finish the two steps sequentially for training.

  1. fill the blank labeled by 'TODO' in Exp-VQA/mylavis/projects/blip2/train/vqa_ft_vicuna7b_vqa.yaml

  2. training for Exp-VQA

python -m torch.distributed.run --nproc_per_node=4 train.py --cfg-path mylavis/projects/blip2/train/vqa_ft_vicuna7b_vqa.yaml

(2) Test

  1. in test.py, finish the image path and model path
import torch
from PIL import Image
from mylavis.models import my_load_model_and_preprocess

# load sample image
raw_image = Image.open("figs/happy.jpg").convert("RGB")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# loads Exp-VQA model
# this also loads the associated image processors
checkpoint_path = './exp_vqa_trimmed.pth'
model, vis_processors, _ = my_load_model_and_preprocess(name="blip2_vicuna_instruct",
                model_type="vicuna7b", dict_path = checkpoint_path, is_eval=True, device=device)
# preprocess the image
# vis_processors stores image transforms for "train" and "eval" 
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)

# input your question
question = "How can this person's emotion be inferred from their facial actions?"

# generate answer
print('[1 answer]:',model.generate({"image": image, "prompt":question}))

# use nucleus sampling for diverse outputs 
print('[3 answers]:',model.generate({"image": image, "prompt":question}, use_nucleus_sampling=True, num_captions=3))

Then run it, you can get the answers.

python test.py

✏️ Citation

If you find this work useful for your research, please feel free to leave a star⭐️ and cite our paper:

@article{yuan2025exp,
  title={Exp-VQA: fine-grained facial expression analysis via visual question answering},
  author={Yuan, Yujian and Zeng, Jiabei and Shan, Shiguang},
  journal={Pattern Recognition},
  pages={111783},
  year={2025},
  publisher={Elsevier}
}

🤝 Acknowledgement

This work is supported by National Natural Science Foundation of China (No. 62176248). We also thank ICT computing platform for providing GPUs. We thank Salesforce Research sharing the code of InstructBLIP via LAVIS. Our codes are based on LAVIS.

About

Official implementation of Pattern Recognition 2025 paper 《Exp-VQA: Fine-grained Facial Expression Analysis via Visual Question Answering》

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published