Official PyTorch Implementation of Exp-VQA (Pattern Recognition 2025).

[Exp-VQA: Fine-grained Facial Expression Analysis via Visual Question Answering]
Yujian Yuan, Jiabei Zeng, Shiguang Shan
Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences

📰 News

[2025.4.28] Exp-VQA is accepted by Pattern Recognition 2025 (IF: 7.5) ! 🎉
[2024.11.24] Training and test codes of Exp-VQA are available.
[2024.11.24] Sythesized VQA pairs used for training are available.
[2024.10.24] ~~Code and trained models will be released here.~~ Welcome to watch this repository for the latest updates.
[2024.10.24] This work is an extension of our preliminary work Exp-BLIP.

⬇️ Data and Models Download

(1) Sythesized VQA pairs

VQA type	Link
Global facial expression captioning (Q1)	OneDrive
Local facail actions captioning (Q2)	OneDrive
Single AU detection (Q3)	OneDrive

(2) Trained models

Model	Link
Exp-VQA	OneDrive
Exp-VQA(fz)	OneDrive

🔨 Installation

(Optional) Creating conda environment

conda create -n expvqa python=3.8.12
conda activate expvqa

Download the packages in requirements.txt

pip install -r requirements.txt

Download this repo.

git clone https://github.com/Yujianyuan/Exp-VQA.git
cd Exp-VQA

🚀 Getting started

(1) Training

You should finish the two steps sequentially for training.

fill the blank labeled by 'TODO' in Exp-VQA/mylavis/projects/blip2/train/vqa_ft_vicuna7b_vqa.yaml
training for Exp-VQA

python -m torch.distributed.run --nproc_per_node=4 train.py --cfg-path mylavis/projects/blip2/train/vqa_ft_vicuna7b_vqa.yaml

(2) Test

in test.py, finish the image path and model path

import torch
from PIL import Image
from mylavis.models import my_load_model_and_preprocess

# load sample image
raw_image = Image.open("figs/happy.jpg").convert("RGB")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# loads Exp-VQA model
# this also loads the associated image processors
checkpoint_path = './exp_vqa_trimmed.pth'
model, vis_processors, _ = my_load_model_and_preprocess(name="blip2_vicuna_instruct",
                model_type="vicuna7b", dict_path = checkpoint_path, is_eval=True, device=device)
# preprocess the image
# vis_processors stores image transforms for "train" and "eval" 
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)

# input your question
question = "How can this person's emotion be inferred from their facial actions?"

# generate answer
print('[1 answer]:',model.generate({"image": image, "prompt":question}))

# use nucleus sampling for diverse outputs 
print('[3 answers]:',model.generate({"image": image, "prompt":question}, use_nucleus_sampling=True, num_captions=3))

Then run it, you can get the answers.

python test.py

✏️ Citation

If you find this work useful for your research, please feel free to leave a star⭐️ and cite our paper:

@article{yuan2025exp,
  title={Exp-VQA: fine-grained facial expression analysis via visual question answering},
  author={Yuan, Yujian and Zeng, Jiabei and Shan, Shiguang},
  journal={Pattern Recognition},
  pages={111783},
  year={2025},
  publisher={Elsevier}
}

🤝 Acknowledgement

This work is supported by National Natural Science Foundation of China (No. 62176248). We also thank ICT computing platform for providing GPUs. We thank Salesforce Research sharing the code of InstructBLIP via LAVIS. Our codes are based on LAVIS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Official PyTorch Implementation of Exp-VQA (Pattern Recognition 2025).

📰 News

⬇️ Data and Models Download

(1) Sythesized VQA pairs

(2) Trained models

🔨 Installation

🚀 Getting started

(1) Training

(2) Test

✏️ Citation

🤝 Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
figs		figs
mylavis		mylavis
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Yujianyuan/Exp-VQA

Folders and files

Latest commit

History

Repository files navigation

Official PyTorch Implementation of Exp-VQA (Pattern Recognition 2025).

📰 News

⬇️ Data and Models Download

(1) Sythesized VQA pairs

(2) Trained models

🔨 Installation

🚀 Getting started

(1) Training

(2) Test

✏️ Citation

🤝 Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages