Yu Gao;Weihong Ren;Weibo Jiang;Qian Dong;Wei Nie;Wenhao Wu;Honghai Liu
{"title":"JADFER: Exploring Spatial-Contextual Interaction With Joint Attention Dropping for Facial Expression Recognition","authors":"Yu Gao;Weihong Ren;Weibo Jiang;Qian Dong;Wei Nie;Wenhao Wu;Honghai Liu","doi":"10.1109/TAFFC.2024.3454988","DOIUrl":null,"url":null,"abstract":"Facial Expression Recognition (FER) aims to categorize emotional expressions depicted on a human face, and is a challenging task under unconstrained conditions, such as face occlusions and pose variations. Recent methods usually adopt self attention or cross attention to explore global or local relationships among different level features. However, these methods are inclined to focus on the redundant facial regions, causing model overfitting. To address this problem, we propose a new FER model named JADFER, which drops the joint attention in the weight matrix to adaptively enhance facial expression representations. Specifically, our JADFER model consists of three components: Spatial Branch (SB), Contextual Branch (CB), and Spatial-Contextual Interaction (SCI). First, SB runs <inline-formula><tex-math>$N$</tex-math></inline-formula> paths in parallel, where a Variety loss is designed to guide the paths of SB to focus on different discriminative regions. Meanwhile, CB abstracts the contextual facial representations using self attention with Joint Attention Dropping (JAD). Then, the SCI adopts the spatial features from SB to query the contextual representations from CB through cross attention with JAD, which regulates the attention weights by dropping the similar activations to further enhance the facial embeddings. Experimental results demonstrate that the proposed model outperforms the state-of-the-art methods on several FER benchmarks.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 2","pages":"655-668"},"PeriodicalIF":9.8000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10666158/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Facial Expression Recognition (FER) aims to categorize emotional expressions depicted on a human face, and is a challenging task under unconstrained conditions, such as face occlusions and pose variations. Recent methods usually adopt self attention or cross attention to explore global or local relationships among different level features. However, these methods are inclined to focus on the redundant facial regions, causing model overfitting. To address this problem, we propose a new FER model named JADFER, which drops the joint attention in the weight matrix to adaptively enhance facial expression representations. Specifically, our JADFER model consists of three components: Spatial Branch (SB), Contextual Branch (CB), and Spatial-Contextual Interaction (SCI). First, SB runs $N$ paths in parallel, where a Variety loss is designed to guide the paths of SB to focus on different discriminative regions. Meanwhile, CB abstracts the contextual facial representations using self attention with Joint Attention Dropping (JAD). Then, the SCI adopts the spatial features from SB to query the contextual representations from CB through cross attention with JAD, which regulates the attention weights by dropping the similar activations to further enhance the facial embeddings. Experimental results demonstrate that the proposed model outperforms the state-of-the-art methods on several FER benchmarks.
期刊介绍:
The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.