Batch Transformer: Look for Attention in Batch

📰 News

[2025.11.06] Batch Transformer is published for IEEE ACCESS
[2025.10.29] Batch Transformer is accepted for IEEE ACCESS

✨ Overview

Facial expression recognition (FER) has received considerable attention in computer vision, with “in-the-wild” environments such as human-computer interaction. However, FER images contain uncertainties such as occlusion, low resolution, pose variation, illumination variation, and subjectivity, which include some expressions that do not match the target label. Consequently, minimal information is obtained from a noisy single image, making it unreliable. This can significantly degrade the performance of the FER task. To address this issue, we propose a batch transformer (BT), comprising the proposed class batch attention (CBA) module to prevent overfitting in noisy data and to extract trustworthy information by training on features reflected from several images in a batch, instead of information from a single image. We also propose multi-level attention (MLA) to prevent the overfitting of specific features by capturing the correlations between each level. In this paper, we present a batch transformer network (BTN) that combines the above proposals.

class BatchTransformer(nn.Module):
    def __init__(self,
                 channel,
                 ):
        factory_kwargs = {'device': None, 'dtype': None}
        super().__init__()
        self.pos_encoder = ChannelPositionalEncoding(channel=channel)
        
    def forward(self, query,key,value):

        q,k = self.pos_encoder(query,key) #B,C,49
        output_CBA = ClassBatchAttention(q,k,value)
        output_BT = output_CBA + value

        return output_CBA, output_BT
    
def ClassBatchAttention(q,k,value):

    #query :B,C,E -> 1,C,B,E
    q = q.contiguous().transpose(0,1).unsqueeze(0) #1, C, B, 49
    k = k.contiguous().transpose(0,1).unsqueeze(0)

    #value :B,C -> C,B,1
    v = value.unsqueeze(1) #B, 1, C
    v = v.contiguous().transpose(0,-1)#C,1,B
    v = v.contiguous().transpose(-2,-1)#C,B,1

    _,C,B,_ = q.shape #1, C, B, 49
    q_scaled = q / math.sqrt(C)
    attn_output_weights = q_scaled @ k.transpose(-2, -1)#1,C,B,B
    attn_output_weights = torch.softmax(attn_output_weights, dim=-1) #1,C,B,B
    attn_output_weights = attn_output_weights.contiguous().view(C,B,B) #C,B,B

    attn_output = torch.bmm(attn_output_weights, v) #C,B,1
    attn_output = attn_output.contiguous().view(C, B)#C,B
    attn_output = attn_output.transpose(-1,-2) #B,C
    return attn_output

BatchTransformer.py

🚀 Main Results

✨ Static Facial Expression Recognition

🔨 Installation

Run the following command to make virtual environments

conda create -n BTN python=3.7.16
conda activate BTN
pip install -r requirements.txt

➡️ Data Preparation

Preparing Data

As an example, assume we wish to run RAF-DB. We need to make sure it have a structure like following:

 - data/raf-db/
 	 train/label(ex., 1,2,...,7)/
 	     train_00001_aligned.jpg
 	     train_00002_aligned.jpg
 	     ...
 	 test/label(ex., 1,2,..,7)/
 	     test_0001_aligned.jpg
 	     test_0002_aligned.jpg
 	     ...

Preparing Pretrained Models

The following table provides the pre-trained checkpoints used in this paper. Put entire pretrain folder under models folder and modify the checkpoint path in BTN_7cls.py and BTN_8cls.py under models. ex)face_landback_checkpoint = torch.load(r'path/to/mobilefacenet checkpoint'), and ir_checkpoint = torch.load(r'path/to/ir50 checkpoint').

pre-trained checkpoint ownCloud
ir50 download

mobilefacenet download

📋 Reported Results and Checkpoints

The following table provides BTN checkpoints in each dataset.

dataset	top-1 acc	ownCloud
RAF-DB	92.54	download
AffectNet (7 cls)	67.60	download
AffectNet (8 cls)	64.29	download

Test

You can evaluate our model on RAF-DB dataset by running:

python main.py --data path/to/dataset --evaluate checkpoint/RAFDB92.54.pth

You can evaluate our model on AffectNet (7 cls) dataset by running:

python main.py --data path/to/dataset --evaluate checkpoint/AffectNet7_67.60pth

You can evaluate our model on AffectNet (8 cls) dataset by running:

python main_8.py --data path/to/dataset --evaluate checkpoint/AffectNet8_64.29.pth

Train

You can train BTN on RAF-DB dataset by running as follows:

python main.py --data path/to/dataset --data_type RAF-DB --lr 2e-5 --batch-size 64 --epochs 300 --gpu 0

You can train BTN on AffectNet (7 cls) dataset by running as follows:

python main.py --data path/to/dataset --data_type AffectNet-7 --lr 0.8e-6 --batch-size 144 --epochs 100 --gpu 0

You can train BTN on AffectNet (8 cls) dataset by running as follows:

python main_8cls.py --data path/to/dataset --lr 1e-6 --batch-size 144 --epochs 100 --gpu 0

You can continue your training by running:

python main.py --data path/to/dataset --resume checkpoint/RAFDB92.54.pth

☎️ Contact

If you have any questions, please feel free to reach me out at [email protected].

👍 Acknowledgements

This work was supported by the IITP(Institute of Information Communications Technology Planning Evaluation)-ITRC(Information Technology Research Center) grant funded by the Korea government(Ministry of Science and ICT)(IITP-2025-RS-2022-00156295)

✏️ Citation

If you think this project is helpful, please feel free to leave a star⭐️ and cite our paper:

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
checkpoint		checkpoint
data_preprocessing		data_preprocessing
figures		figures
log		log
models		models
LICENSE		LICENSE
README.md		README.md
main.py		main.py
main_8cls.py		main_8cls.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Batch Transformer: Look for Attention in Batch

📰 News

✨ Overview

🚀 Main Results

✨ Static Facial Expression Recognition

🔨 Installation

➡️ Data Preparation

📋 Reported Results and Checkpoints

Test

Train

☎️ Contact

👍 Acknowledgements

✏️ Citation

About

Uh oh!

Releases

Packages

Languages

License

SeoulTech-HCIRLab/BTN

Folders and files

Latest commit

History

Repository files navigation

Batch Transformer: Look for Attention in Batch

📰 News

✨ Overview

🚀 Main Results

✨ Static Facial Expression Recognition

🔨 Installation

➡️ Data Preparation

📋 Reported Results and Checkpoints

Test

Train

☎️ Contact

👍 Acknowledgements

✏️ Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages