MFDAN: Multi-Level Flow-Driven Attention Network for Micro-Expression Recognition

IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-08-02 DOI:10.1109/TCSVT.2024.3437481
Wenhao Cai;Junli Zhao;Ran Yi;Minjing Yu;Fuqing Duan;Zhenkuan Pan;Yong-Jin Liu
{"title":"MFDAN: Multi-Level Flow-Driven Attention Network for Micro-Expression Recognition","authors":"Wenhao Cai;Junli Zhao;Ran Yi;Minjing Yu;Fuqing Duan;Zhenkuan Pan;Yong-Jin Liu","doi":"10.1109/TCSVT.2024.3437481","DOIUrl":null,"url":null,"abstract":"Facial expressions are an essential part of human emotional communication, and micro-expressions (MEs), as transient and imperceptible non-verbal signals, can potentially reveal real human emotions. However, subtle motion variations, limited and unbalanced samples make micro-expression recognition (MER) challenging. In this paper, we design a novel dual-branch learning framework of multi-level flow-driven attention for micro-expression recognition (MFDAN), which innovatively integrates optical flow prior to guide the attention learning in the image encoding branch, enabling the model to focus on the most discriminative facial regions for subtle motion patterns. Firstly, we extract optical flow information by an optical flow encoding module. Then, in the image coding module, we construct a Transformer structure containing an optical flow-driven attention mechanism, which can effectively locate the interest region of micro-expressions in the image according to the position information of optical flow to capture more sensitive and fine-grained micro-expressions. By interoperating prior knowledge with data learning, and introducing the Dropkey operation and Focal Loss, our method can handle subtle micro-expression features on small imbalanced datasets. Through extensive experiments on three independent datasets and a composite database, including SMIC-HS, SAMM, and CASME II, robust leave-one-subject-out (LOSO) evaluation results show that our method outperforms state-of-the-art methods especially on the composite database.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"12823-12836"},"PeriodicalIF":11.1000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10621611/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Facial expressions are an essential part of human emotional communication, and micro-expressions (MEs), as transient and imperceptible non-verbal signals, can potentially reveal real human emotions. However, subtle motion variations, limited and unbalanced samples make micro-expression recognition (MER) challenging. In this paper, we design a novel dual-branch learning framework of multi-level flow-driven attention for micro-expression recognition (MFDAN), which innovatively integrates optical flow prior to guide the attention learning in the image encoding branch, enabling the model to focus on the most discriminative facial regions for subtle motion patterns. Firstly, we extract optical flow information by an optical flow encoding module. Then, in the image coding module, we construct a Transformer structure containing an optical flow-driven attention mechanism, which can effectively locate the interest region of micro-expressions in the image according to the position information of optical flow to capture more sensitive and fine-grained micro-expressions. By interoperating prior knowledge with data learning, and introducing the Dropkey operation and Focal Loss, our method can handle subtle micro-expression features on small imbalanced datasets. Through extensive experiments on three independent datasets and a composite database, including SMIC-HS, SAMM, and CASME II, robust leave-one-subject-out (LOSO) evaluation results show that our method outperforms state-of-the-art methods especially on the composite database.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MFDAN:用于微表情识别的多级流量驱动注意力网络
面部表情是人类情感交流的重要组成部分,而微表情作为一种短暂的、难以察觉的非语言信号,可以潜在地揭示人类的真实情感。然而,微小的运动变化,有限和不平衡的样本使得微表情识别(MER)具有挑战性。在本文中,我们设计了一种新的用于微表情识别(MFDAN)的多层次流驱动注意双分支学习框架,该框架创新性地在图像编码分支中集成了光流优先指导注意学习,使模型能够专注于最具鉴别性的面部区域,以识别细微的运动模式。首先,利用光流编码模块提取光流信息;然后,在图像编码模块中,我们构建了一个包含光流驱动注意机制的Transformer结构,该结构可以根据光流的位置信息有效地定位图像中微表情的兴趣区域,以捕获更敏感、更细粒度的微表情。通过将先验知识与数据学习相结合,并引入Dropkey操作和Focal Loss,该方法可以处理小型不平衡数据集上细微的微表情特征。通过在SMIC-HS、SAMM和CASME II三个独立数据集和一个复合数据库上的大量实验,鲁棒性LOSO评价结果表明,我们的方法优于目前最先进的方法,特别是在复合数据库上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
13.80
自引率
27.40%
发文量
660
审稿时长
5 months
期刊介绍: The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.
期刊最新文献
TinySplat: Feedforward Approach for Generating Compact 3D Scene Representation GSCodec Studio: A Modular Framework for Gaussian Splat Compression Syntax Element Encryption for H.265/HEVC Using Chaotic Map-Based Coefficient Scrambling Scheme Learning Confidence-Aware Prototypes for Weakly-Supervised Video Anomaly Detection Learned Point Cloud Attribute Compression With Cross-Scale Point Transformer and Geometry-Aware Context Prediction Entropy Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1