Multi-loss, feature fusion and improved top-two-voting ensemble for facial expression recognition in the wild

IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Networks Pub Date : 2024-11-26 DOI:10.1016/j.neunet.2024.106937
Guangyao Zhou , Yuanlun Xie , Yiqin Fu , Zhaokun Wang
{"title":"Multi-loss, feature fusion and improved top-two-voting ensemble for facial expression recognition in the wild","authors":"Guangyao Zhou ,&nbsp;Yuanlun Xie ,&nbsp;Yiqin Fu ,&nbsp;Zhaokun Wang","doi":"10.1016/j.neunet.2024.106937","DOIUrl":null,"url":null,"abstract":"<div><div>Facial expression recognition (FER) in the wild is a challenging pattern recognition task affected by the images’ low quality and has attracted broad interest in computer vision. Existing FER methods failed to obtain sufficient accuracy to support the practical applications, especially in scenarios with low fault tolerance, which limits the adaptability of FER. Targeting exploring the possibility of further improving the accuracy of FER in the wild, this paper proposes a novel single model named R18+FAML and an ensemble model named R18+FAML-FGA-T2V, which applies intra-feature fusion within a single network, feature fusion among multiple networks, and the ensemble decision strategy. Based on the backbone of ResNet18 (R18), R18+FAML combines internal feature fusion and three attention blocks, as well as uses multiple loss functions (FAML) to improve the diversity of the feature extraction. To effectively integrate feature extractors from multiple networks, we propose feature fusion among networks based on the genetic algorithm (FGA). Comprehensively considering and utilizing more classification information, we propose an ensemble strategy, i.e., the improved top-two-voting (T2V) of multiple networks with the same structure. Combining the above strategies, R18+FAML-FGA-T2V can focus on the main expression-aware areas by integrating interest areas of multiple networks. From experiments on three challenging FER datasets in the wild including RAF-DB, AffectNet-8 and AffectNet-7, our single model R18+FAML and ensemble model R18+FAML-FGA-T2V achieve the accuracies of <span><math><mrow><mfenced><mrow><mn>90</mn><mo>.</mo><mn>32</mn><mo>,</mo><mn>62</mn><mo>.</mo><mn>17</mn><mo>,</mo><mn>65</mn><mo>.</mo><mn>83</mn></mrow></mfenced><mtext>%</mtext></mrow></math></span> and <span><math><mrow><mfenced><mrow><mn>91</mn><mo>.</mo><mn>59</mn><mo>,</mo><mn>63</mn><mo>.</mo><mn>27</mn><mo>,</mo><mn>66</mn><mo>.</mo><mn>63</mn></mrow></mfenced><mtext>%</mtext></mrow></math></span> respectively, both achieving the state-of-the-art results.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"Article 106937"},"PeriodicalIF":6.3000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608024008669","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Facial expression recognition (FER) in the wild is a challenging pattern recognition task affected by the images’ low quality and has attracted broad interest in computer vision. Existing FER methods failed to obtain sufficient accuracy to support the practical applications, especially in scenarios with low fault tolerance, which limits the adaptability of FER. Targeting exploring the possibility of further improving the accuracy of FER in the wild, this paper proposes a novel single model named R18+FAML and an ensemble model named R18+FAML-FGA-T2V, which applies intra-feature fusion within a single network, feature fusion among multiple networks, and the ensemble decision strategy. Based on the backbone of ResNet18 (R18), R18+FAML combines internal feature fusion and three attention blocks, as well as uses multiple loss functions (FAML) to improve the diversity of the feature extraction. To effectively integrate feature extractors from multiple networks, we propose feature fusion among networks based on the genetic algorithm (FGA). Comprehensively considering and utilizing more classification information, we propose an ensemble strategy, i.e., the improved top-two-voting (T2V) of multiple networks with the same structure. Combining the above strategies, R18+FAML-FGA-T2V can focus on the main expression-aware areas by integrating interest areas of multiple networks. From experiments on three challenging FER datasets in the wild including RAF-DB, AffectNet-8 and AffectNet-7, our single model R18+FAML and ensemble model R18+FAML-FGA-T2V achieve the accuracies of 90.32,62.17,65.83% and 91.59,63.27,66.63% respectively, both achieving the state-of-the-art results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于多损失、特征融合和改进的前两名投票集成的野生面部表情识别
面部表情识别是一项具有挑战性的模式识别任务,受到图像质量低下的影响,引起了计算机视觉领域的广泛关注。现有的滤波方法无法获得足够的精度来支持实际应用,特别是在容错能力较低的情况下,限制了滤波的适应性。为了探索进一步提高野外特征识别准确率的可能性,本文提出了一种新的单一模型R18+FAML和集成模型R18+FAML- fga - t2v,分别应用了单网络内特征融合、多网络间特征融合和集成决策策略。R18+FAML以ResNet18 (R18)为骨干,结合内部特征融合和三个注意块,并使用多重损失函数(FAML)提高特征提取的多样性。为了有效整合多个网络的特征提取器,提出了基于遗传算法的网络间特征融合。综合考虑和利用更多的分类信息,提出了一种集成策略,即具有相同结构的多个网络的改进top-two-voting (T2V)。结合上述策略,R18+FAML-FGA-T2V可以通过整合多个网络的兴趣区域,专注于主要的表达感知区域。通过在RAF-DB、AffectNet-8和AffectNet-7三个具有挑战性的野外数据集上的实验,我们的单一模型R18+FAML和集成模型R18+FAML- fga - t2v的准确率分别达到了90.32、62.17、65.83%和91.59、63.27、66.63%,均达到了最先进的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Neural Networks
Neural Networks 工程技术-计算机:人工智能
CiteScore
13.90
自引率
7.70%
发文量
425
审稿时长
67 days
期刊介绍: Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.
期刊最新文献
Adaptive dendritic plasticity in brain-inspired dynamic neural networks for enhanced multi-timescale feature extraction. Corrigendum to "MultiverseAD: Enhancing Spatial-Temporal Synchronous Attention Networks with Causal Knowledge for Multivariate Time Series Anomaly Detection" [Neural Networks 192 (2025) 107903]. NaturalL2S: End-to-end high-quality multispeaker lip-to-speech synthesis with differential digital signal processing. Joint generative and alignment adversarial learning for robust incomplete multi-view clustering. DiffMixer: A prediction model based on mixing different frequency features.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1