Facial expression recognition (FER) plays a significant role in the ubiquitous application of computer vision. We revisit this problem with a new perspective on whether it can acquire useful representations that improve FER performance in the image generation process, and propose a novel generative method based on the image inversion mechanism for the FER task, termed Inversion FER (IFER). Particularly, we devise a novel Adversarial Style Inversion Transformer (ASIT) towards IFER to comprehensively extract features of generated facial images. In addition, ASIT is equipped with an image inversion discriminator that measures the cosine similarity of semantic features between source and generated images, constrained by a distribution alignment loss. Finally, we introduce a feature modulation module to fuse the structural code and latent codes from ASIT for the subsequent FER work. We extensively evaluate ASIT on facial datasets such as FFHQ and CelebA-HQ, showing that our approach achieves state-of-the-art facial inversion performance. IFER also achieves competitive results in facial expression recognition datasets such as RAFDB, SFEW and AffectNet.
The architecture of ASIT is demonstrated in the following:
More results:
2022.11.16:ASIT model, training and validation scripts released for FFHQ and Celeba-HQ image inversion. IFER model, training and validation scripts released for RAF-DB, SFEW and AffecNet image classification.
Please download the pre-trained models from the following links. Each ASIT model contains the entire ASIT architecture, including the encoder and decoder weights.
| Path | Description |
|---|---|
| FFHQ Inversion(code:asit) | ASIT trained with the FFHQ dataset for facial inversion. |
| Celeba-HQ Inversion(code:asit) | ASIT trained with the Celeba-HQ dataset for facial inversion. |
| RAF-DB checkpoint(code:ifer) | IFER checkpoint trained with the RAF-DB for facial expression classification. |
If you wish to use one of the pretrained models for training or inference, you may do so using the flag --checkpoint_path.
In addition, we provide various auxiliary models needed for training your own ASIT model from scratch as well as pretrained models needed.
| Path | Description |
|---|---|
| FFHQ StyleGAN | StyleGAN model pretrained on FFHQ taken from rosinality with 1024x1024 output resolution. |
| IR-SE50 Model | Pretrained IR-SE50 model taken from TreB1eN for use in our ID loss during pSp training. |
| MoCo ResNet-50 | Pretrained ResNet-50 model trained using MOCOv2 for computing MoCo-based similarity loss on non-facial domains. The model is taken from the official implementation. |
| CurricularFace Backbone | Pretrained CurricularFace model taken from HuangYG123 for use in ID similarity metric computation. |
| MTCNN | Weights for MTCNN model taken from TreB1eN for use in ID similarity metric computation. (Unpack the tar.gz to extract the 3 model weights.) |
By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models. However, you may use your own paths by changing the necessary values in configs/path_configs.py.
- Currently, we provide support for numerous datasets and experiments.
- Refer to
configs/paths_config.pyto define the necessary data paths and model paths for ASIT training and evaluation. - Refer to
configs/transforms_config.pyfor the transforms defined for each dataset/experiment. - Finally, refer to
configs/data_configs.pyfor the source/target data paths for the train and test sets
as well as the transforms.
- Refer to
- If you wish to experiment with your own dataset, you can simply make the necessary adjustments in
data_configs.pyto define your data paths.transforms_configs.pyto define your own data transforms.
As an example, assume we wish to run encoding using ffhq (dataset_type=ffhq_encode).
We first go to configs/paths_config.py and define:
'ffhq': '/path/to/ffhq/images256x256' 'celeba_test': '/path/to/CelebAMask-HQ/test_img',}
The main training script can be found in scripts/train_asit.py.
Intermediate training results are saved to opts.exp_dir. This includes checkpoints, train outputs, and test outputs.
Additionally, if you have tensorboard installed, you can visualize tensorboard logs in opts.exp_dir/logs.
python scripts/train_asit.py \
--dataset_type=ffhq_encode \
--exp_dir=/path/to/experiment \
--workers=8 \
--batch_size=8 \
--test_batch_size=8 \
--test_workers=8 \
--val_interval=2500 \
--save_interval=5000 \
--encoder_type=GradualStyleEncoder \
--start_from_latent_avg \
--lpips_lambda=0.8 \
--l2_lambda=1 \
--id_lambda=0.1
Finetuning the ASIT Encoder on RAF-DB datasets, we randomly selected 400 images from rafdb training set as validation set.
As an example, assume we wish to run encoding using ffhq (dataset_type=ffhq_encode).
We first go to configs/paths_config.py and define:
'rafdb_train': '/path/to/rafdb/train/images256x256' 'rafdb_val': '/path/to/rafdb/val/images256x256',}
python scripts/train_asit.py \
--dataset_type=rafdb_encode \
--exp_dir=/path/to/finetune_experiment \
--workers=8 \
--batch_size=8 \
--test_batch_size=8 \
--test_workers=8 \
--val_interval=2500 \
--save_interval=5000 \
--max_steps=20000\
--checkpoint_path=/path/to/experiment /checkpoints/best_model.pt\
--encoder_type=GradualStyleEncoder \
--start_from_latent_avg \
--lpips_lambda=0.8 \
--l2_lambda=1 \
--id_lambda=0.1
python -m torch.distributed.launch --nproc_per_node <num-of-gpus> --master_port 11223 train_ifer.py \
--config <configs/ifer.yml> --data_dir <rafdb-path> --batch-size <batch-size-per-gpu> --tag <run-tag>
Having trained your model, you can use scripts/inference.py to apply the model on a set of images.
For example,
python scripts/inference.py \
--exp_dir=/path/to/experiment \
--checkpoint_path=experiment/checkpoints/best_model.pt \
--data_path=/path/to/test_data \
--test_batch_size=4 \
--test_workers=4 \
--couple_outputs
- Calculating LPIPS loss:
python scripts/calc_losses_on_images.py \
--mode lpips
--data_path=/path/to/experiment/inference_outputs \
--gt_path=/path/to/test_images \
- Calculating L2 loss:
python scripts/calc_losses_on_images.py \
--mode l2
--data_path=/path/to/experiment/inference_outputs \
--gt_path=/path/to/test_images \
- Calculating PSNR and SSIM:
python scripts/calc_psnr_ssim.py \
--inference_dir=/path/to/experiment/inference_outputs \
--real_dir=/path/to/test_images \
- Validate IFER:
python scripts/validate.py --model <model-name> --checkpoint <checkpoint-path>
--data_dir <rafdb-path> --batch-size <batch-size-per-gpu>
If you use this code for your research, please cite our paper More comprehensive facial inversion for more effective expression recognition:





