Skip to content

SeoulTech-HCIRLab/BTN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Batch Transformer: Look for Attention in Batch

📰 News

[2025.11.06] Batch Transformer is published for IEEE ACCESS
[2025.10.29] Batch Transformer is accepted for IEEE ACCESS

✨ Overview

Facial expression recognition (FER) has received considerable attention in computer vision, with “in-the-wild” environments such as human-computer interaction. However, FER images contain uncertainties such as occlusion, low resolution, pose variation, illumination variation, and subjectivity, which include some expressions that do not match the target label. Consequently, minimal information is obtained from a noisy single image, making it unreliable. This can significantly degrade the performance of the FER task. To address this issue, we propose a batch transformer (BT), comprising the proposed class batch attention (CBA) module to prevent overfitting in noisy data and to extract trustworthy information by training on features reflected from several images in a batch, instead of information from a single image. We also propose multi-level attention (MLA) to prevent the overfitting of specific features by capturing the correlations between each level. In this paper, we present a batch transformer network (BTN) that combines the above proposals.

fig2

class BatchTransformer(nn.Module):
    def __init__(self,
                 channel,
                 ):
        factory_kwargs = {'device': None, 'dtype': None}
        super().__init__()
        self.pos_encoder = ChannelPositionalEncoding(channel=channel)
        
    def forward(self, query,key,value):

        q,k = self.pos_encoder(query,key) #B,C,49
        output_CBA = ClassBatchAttention(q,k,value)
        output_BT = output_CBA + value

        return output_CBA, output_BT
    
def ClassBatchAttention(q,k,value):

    #query :B,C,E -> 1,C,B,E
    q = q.contiguous().transpose(0,1).unsqueeze(0) #1, C, B, 49
    k = k.contiguous().transpose(0,1).unsqueeze(0)

    #value :B,C -> C,B,1
    v = value.unsqueeze(1) #B, 1, C
    v = v.contiguous().transpose(0,-1)#C,1,B
    v = v.contiguous().transpose(-2,-1)#C,B,1

    _,C,B,_ = q.shape #1, C, B, 49
    q_scaled = q / math.sqrt(C)
    attn_output_weights = q_scaled @ k.transpose(-2, -1)#1,C,B,B
    attn_output_weights = torch.softmax(attn_output_weights, dim=-1) #1,C,B,B
    attn_output_weights = attn_output_weights.contiguous().view(C,B,B) #C,B,B

    attn_output = torch.bmm(attn_output_weights, v) #C,B,1
    attn_output = attn_output.contiguous().view(C, B)#C,B
    attn_output = attn_output.transpose(-1,-2) #B,C
    return attn_output

BatchTransformer.py

🚀 Main Results

✨ Static Facial Expression Recognition

Result_on_RAF-DB, AffectNet-7, AffectNet-8 dataset

🔨 Installation

Run the following command to make virtual environments

conda create -n BTN python=3.7.16
conda activate BTN
pip install -r requirements.txt

➡️ Data Preparation

  • Preparing Data

    As an example, assume we wish to run RAF-DB. We need to make sure it have a structure like following:

     - data/raf-db/
     	 train/label(ex., 1,2,...,7)/
     	     train_00001_aligned.jpg
     	     train_00002_aligned.jpg
     	     ...
     	 test/label(ex., 1,2,..,7)/
     	     test_0001_aligned.jpg
     	     test_0002_aligned.jpg
     	     ...
    
  • Preparing Pretrained Models

    The following table provides the pre-trained checkpoints used in this paper. Put entire pretrain folder under models folder and modify the checkpoint path in BTN_7cls.py and BTN_8cls.py under models. ex)face_landback_checkpoint = torch.load(r'path/to/mobilefacenet checkpoint'), and ir_checkpoint = torch.load(r'path/to/ir50 checkpoint').

    pre-trained checkpoint ownCloud
    ir50 download
    mobilefacenet download

📋 Reported Results and Checkpoints

The following table provides BTN checkpoints in each dataset.

dataset top-1 acc ownCloud
RAF-DB 92.54 download
AffectNet (7 cls) 67.60 download
AffectNet (8 cls) 64.29 download

Test

You can evaluate our model on RAF-DB dataset by running:

python main.py --data path/to/dataset --evaluate checkpoint/RAFDB92.54.pth

You can evaluate our model on AffectNet (7 cls) dataset by running:

python main.py --data path/to/dataset --evaluate checkpoint/AffectNet7_67.60pth

You can evaluate our model on AffectNet (8 cls) dataset by running:

python main_8.py --data path/to/dataset --evaluate checkpoint/AffectNet8_64.29.pth

Train

You can train BTN on RAF-DB dataset by running as follows:

python main.py --data path/to/dataset --data_type RAF-DB --lr 2e-5 --batch-size 64 --epochs 300 --gpu 0

You can train BTN on AffectNet (7 cls) dataset by running as follows:

python main.py --data path/to/dataset --data_type AffectNet-7 --lr 0.8e-6 --batch-size 144 --epochs 100 --gpu 0

You can train BTN on AffectNet (8 cls) dataset by running as follows:

python main_8cls.py --data path/to/dataset --lr 1e-6 --batch-size 144 --epochs 100 --gpu 0

You can continue your training by running:

python main.py --data path/to/dataset --resume checkpoint/RAFDB92.54.pth

☎️ Contact

If you have any questions, please feel free to reach me out at [email protected].

👍 Acknowledgements

This work was supported by the IITP(Institute of Information Communications Technology Planning Evaluation)-ITRC(Information Technology Research Center) grant funded by the Korea government(Ministry of Science and ICT)(IITP-2025-RS-2022-00156295)

✏️ Citation

If you think this project is helpful, please feel free to leave a star⭐️ and cite our paper:

About

batch transformer for emotion recognition (IEEE ACCESS)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages