Sipeng Yang, Hongyu Huang, Ying Sophie Huang, Xiaogang Jin
{"title":"利用时间上下文和特征重新分配检测面部动作单元","authors":"Sipeng Yang, Hongyu Huang, Ying Sophie Huang, Xiaogang Jin","doi":"10.1002/cav.2246","DOIUrl":null,"url":null,"abstract":"<p>Facial action units (AUs) encode the activations of facial muscle groups, playing a crucial role in expression analysis and facial animation. However, current deep learning AU detection methods primarily focus on single-image analysis, which limits the exploitation of rich temporal context for robust outcomes. Moreover, the scale of available datasets remains limited, leading models trained on these datasets to tend to suffer from overfitting issues. This paper proposes a novel AU detection method integrating spatial and temporal data with inter-subject feature reassignment for accurate and robust AU predictions. Our method first extracts regional features from facial images. Then, to effectively capture both the temporal context and identity-independent features, we introduce a temporal feature combination and feature reassignment (TC&FR) module, which transforms single-image features into a cohesive temporal sequence and fuses features across multiple subjects. This transformation encourages the model to utilize identity-independent features and temporal context, thus ensuring robust prediction outcomes. Experimental results demonstrate the enhancements brought by the proposed modules and the state-of-the-art (SOTA) results achieved by our method.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Facial action units detection using temporal context and feature reassignment\",\"authors\":\"Sipeng Yang, Hongyu Huang, Ying Sophie Huang, Xiaogang Jin\",\"doi\":\"10.1002/cav.2246\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Facial action units (AUs) encode the activations of facial muscle groups, playing a crucial role in expression analysis and facial animation. However, current deep learning AU detection methods primarily focus on single-image analysis, which limits the exploitation of rich temporal context for robust outcomes. Moreover, the scale of available datasets remains limited, leading models trained on these datasets to tend to suffer from overfitting issues. This paper proposes a novel AU detection method integrating spatial and temporal data with inter-subject feature reassignment for accurate and robust AU predictions. Our method first extracts regional features from facial images. Then, to effectively capture both the temporal context and identity-independent features, we introduce a temporal feature combination and feature reassignment (TC&FR) module, which transforms single-image features into a cohesive temporal sequence and fuses features across multiple subjects. This transformation encourages the model to utilize identity-independent features and temporal context, thus ensuring robust prediction outcomes. Experimental results demonstrate the enhancements brought by the proposed modules and the state-of-the-art (SOTA) results achieved by our method.</p>\",\"PeriodicalId\":50645,\"journal\":{\"name\":\"Computer Animation and Virtual Worlds\",\"volume\":\"35 3\",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2024-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Animation and Virtual Worlds\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cav.2246\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Animation and Virtual Worlds","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cav.2246","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
摘要
面部动作单元(AU)编码面部肌肉群的激活,在表情分析和面部动画中发挥着至关重要的作用。然而,目前的深度学习 AU 检测方法主要侧重于单张图像分析,这限制了利用丰富的时间背景来获得稳健的结果。此外,可用数据集的规模仍然有限,导致在这些数据集上训练的模型容易出现过拟合问题。本文提出了一种新颖的 AU 检测方法,该方法将空间和时间数据与受试者之间的特征重新分配相结合,以实现准确、稳健的 AU 预测。我们的方法首先从面部图像中提取区域特征。然后,为了有效捕捉时空背景和与身份无关的特征,我们引入了时空特征组合和特征重新分配(TC&FR)模块,该模块将单张图像特征转换为具有凝聚力的时空序列,并融合多个主体的特征。这种转换促使模型利用与身份无关的特征和时间上下文,从而确保预测结果的稳健性。实验结果表明了所提模块带来的改进,以及我们的方法所取得的最先进(SOTA)结果。
Facial action units detection using temporal context and feature reassignment
Facial action units (AUs) encode the activations of facial muscle groups, playing a crucial role in expression analysis and facial animation. However, current deep learning AU detection methods primarily focus on single-image analysis, which limits the exploitation of rich temporal context for robust outcomes. Moreover, the scale of available datasets remains limited, leading models trained on these datasets to tend to suffer from overfitting issues. This paper proposes a novel AU detection method integrating spatial and temporal data with inter-subject feature reassignment for accurate and robust AU predictions. Our method first extracts regional features from facial images. Then, to effectively capture both the temporal context and identity-independent features, we introduce a temporal feature combination and feature reassignment (TC&FR) module, which transforms single-image features into a cohesive temporal sequence and fuses features across multiple subjects. This transformation encourages the model to utilize identity-independent features and temporal context, thus ensuring robust prediction outcomes. Experimental results demonstrate the enhancements brought by the proposed modules and the state-of-the-art (SOTA) results achieved by our method.
期刊介绍:
With the advent of very powerful PCs and high-end graphics cards, there has been an incredible development in Virtual Worlds, real-time computer animation and simulation, games. But at the same time, new and cheaper Virtual Reality devices have appeared allowing an interaction with these real-time Virtual Worlds and even with real worlds through Augmented Reality. Three-dimensional characters, especially Virtual Humans are now of an exceptional quality, which allows to use them in the movie industry. But this is only a beginning, as with the development of Artificial Intelligence and Agent technology, these characters will become more and more autonomous and even intelligent. They will inhabit the Virtual Worlds in a Virtual Life together with animals and plants.