Xuri Ge, Joemon M. Jose, Songpei Xu, Xiao Liu, Hu Han
{"title":"MGRR-Net: Multi-level Graph Relational Reasoning Network for Facial Action Unit Detection","authors":"Xuri Ge, Joemon M. Jose, Songpei Xu, Xiao Liu, Hu Han","doi":"10.1145/3643863","DOIUrl":null,"url":null,"abstract":"<p>The Facial Action Coding System (FACS) encodes the action units (AUs) in facial images, which has attracted extensive research attention due to its wide use in facial expression analysis. Many methods that perform well on automatic facial action unit (AU) detection primarily focus on modelling various AU relations between corresponding local muscle areas or mining global attention-aware facial features; however, they neglect the dynamic interactions among local-global features. We argue that encoding AU features just from one perspective may not capture the rich contextual information between regional and global face features, as well as the detailed variability across AUs, because of the diversity in expression and individual characteristics. In this paper, we propose a novel Multi-level Graph Relational Reasoning Network (termed <i>MGRR-Net</i>) for facial AU detection. Each layer of MGRR-Net performs a multi-level (<i>i.e.</i>, region-level, pixel-wise and channel-wise level) feature learning. On the one hand, the region-level feature learning from the local face patch features via graph neural network can encode the correlation across different AUs. On the other hand, pixel-wise and channel-wise feature learning via graph attention networks (GAT) enhance the discrimination ability of AU features by adaptively recalibrating feature responses of pixels and channels from global face features. The hierarchical fusion strategy combines features from the three levels with gated fusion cells to improve AU discriminative ability. Extensive experiments on DISFA and BP4D AU datasets show that the proposed approach achieves superior performance than the state-of-the-art methods.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"24 1","pages":""},"PeriodicalIF":7.2000,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3643863","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The Facial Action Coding System (FACS) encodes the action units (AUs) in facial images, which has attracted extensive research attention due to its wide use in facial expression analysis. Many methods that perform well on automatic facial action unit (AU) detection primarily focus on modelling various AU relations between corresponding local muscle areas or mining global attention-aware facial features; however, they neglect the dynamic interactions among local-global features. We argue that encoding AU features just from one perspective may not capture the rich contextual information between regional and global face features, as well as the detailed variability across AUs, because of the diversity in expression and individual characteristics. In this paper, we propose a novel Multi-level Graph Relational Reasoning Network (termed MGRR-Net) for facial AU detection. Each layer of MGRR-Net performs a multi-level (i.e., region-level, pixel-wise and channel-wise level) feature learning. On the one hand, the region-level feature learning from the local face patch features via graph neural network can encode the correlation across different AUs. On the other hand, pixel-wise and channel-wise feature learning via graph attention networks (GAT) enhance the discrimination ability of AU features by adaptively recalibrating feature responses of pixels and channels from global face features. The hierarchical fusion strategy combines features from the three levels with gated fusion cells to improve AU discriminative ability. Extensive experiments on DISFA and BP4D AU datasets show that the proposed approach achieves superior performance than the state-of-the-art methods.
面部动作编码系统(FACS)对面部图像中的动作单元(AUs)进行编码,因其在面部表情分析中的广泛应用而引起了广泛的研究关注。许多在面部动作单元(AU)自动检测方面表现出色的方法主要侧重于模拟相应局部肌肉区域之间的各种 AU 关系,或挖掘全局注意力感知面部特征;然而,它们忽略了局部-全局特征之间的动态交互。我们认为,由于表情和个体特征的多样性,仅从一个角度对 AU 特征进行编码可能无法捕捉到区域和全局面部特征之间丰富的上下文信息,也无法捕捉到 AU 之间的细节变化。在本文中,我们提出了一种用于面部 AU 检测的新型多层图关系推理网络(MGRR-Net)。MGRR-Net 的每一层都执行多级(即区域级、像素级和通道级)特征学习。一方面,区域级特征学习通过图神经网络从局部人脸补丁特征中学习,可以编码不同 AU 之间的相关性。另一方面,通过图注意网络(GAT)进行的像素级和通道级特征学习,可以从全局人脸特征中自适应性地重新校准像素和通道的特征响应,从而提高区域特征的辨别能力。分层融合策略将三个层次的特征与门控融合单元相结合,以提高 AU 识别能力。在 DISFA 和 BP4D AU 数据集上进行的大量实验表明,所提出的方法比最先进的方法性能更优。
期刊介绍:
ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world.
ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.