Jing Liu , Xin Li , Jiaqi Zhang , Guangtao Zhai , Yuting Su , Yuyi Zhang , Bo Wang
{"title":"Duration-aware and mode-aware micro-expression spotting for long video sequences","authors":"Jing Liu , Xin Li , Jiaqi Zhang , Guangtao Zhai , Yuting Su , Yuyi Zhang , Bo Wang","doi":"10.1016/j.image.2024.117192","DOIUrl":null,"url":null,"abstract":"<div><p>Micro-expressions (MEs) are unconscious, instant and slight facial movements, revealing people’s true emotions. Locating MEs is a prerequisite of classifying them, while only a few researches focus on this task. Among them, sliding window based methods are the most prevalent. Due to the differences of individual physiological and psychological mechanisms, and some uncontrollable factors, the durations and transition modes of different MEs fluctuate greatly. Limited to fixed window scale and mode, traditional sliding window based ME spotting methods fail to capture the motion changes of all MEs exactly, resulting in performance degradation. In this paper, an ensemble learning based duration & mode-aware (DMA) ME spotting framework is proposed. Specifically, we exploit multiple sliding windows of different scales and modes to generate multiple weak detectors, each of which accommodates to MEs with certain duration and transition mode. Additionally, to get a more comprehensive strong detector, we integrate the analysis results of multiple weak detectors using a voting based aggregation module. Furthermore, a novel interval generation scheme is designed to merge close peaks and their neighbor frames into a complete ME interval. Experimental results on two long video databases show the promising performance of our proposed DMA framework compared with state-of-the-art methods. The codes are available at <span><span>https://github.com/TJUMMG/DMA-ME-Spotting</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117192"},"PeriodicalIF":3.4000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing-Image Communication","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0923596524000936","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Micro-expressions (MEs) are unconscious, instant and slight facial movements, revealing people’s true emotions. Locating MEs is a prerequisite of classifying them, while only a few researches focus on this task. Among them, sliding window based methods are the most prevalent. Due to the differences of individual physiological and psychological mechanisms, and some uncontrollable factors, the durations and transition modes of different MEs fluctuate greatly. Limited to fixed window scale and mode, traditional sliding window based ME spotting methods fail to capture the motion changes of all MEs exactly, resulting in performance degradation. In this paper, an ensemble learning based duration & mode-aware (DMA) ME spotting framework is proposed. Specifically, we exploit multiple sliding windows of different scales and modes to generate multiple weak detectors, each of which accommodates to MEs with certain duration and transition mode. Additionally, to get a more comprehensive strong detector, we integrate the analysis results of multiple weak detectors using a voting based aggregation module. Furthermore, a novel interval generation scheme is designed to merge close peaks and their neighbor frames into a complete ME interval. Experimental results on two long video databases show the promising performance of our proposed DMA framework compared with state-of-the-art methods. The codes are available at https://github.com/TJUMMG/DMA-ME-Spotting.
微表情(ME)是一种无意识的、瞬间的、轻微的面部动作,它揭示了人们的真实情感。定位微表情是对微表情进行分类的前提,但目前只有少数研究关注这一任务。其中,基于滑动窗口的方法最为普遍。由于个体生理和心理机制的差异以及一些不可控因素,不同 ME 的持续时间和转换模式波动很大。受限于固定的窗口尺度和模式,传统的基于滑动窗口的 ME 定位方法无法准确捕捉到所有 ME 的运动变化,导致性能下降。本文提出了一种基于集合学习的时长& 模式感知(DMA)ME 定位框架。具体来说,我们利用不同尺度和模式的多个滑动窗口来生成多个弱检测器,每个检测器都适用于具有特定持续时间和过渡模式的 ME。此外,为了得到更全面的强检测器,我们使用基于投票的聚合模块整合了多个弱检测器的分析结果。此外,我们还设计了一种新颖的时间间隔生成方案,可将接近的峰值及其邻近帧合并为一个完整的 ME 时间间隔。在两个长视频数据库上的实验结果表明,与最先进的方法相比,我们提出的 DMA 框架具有良好的性能。代码见 https://github.com/TJUMMG/DMA-ME-Spotting。
期刊介绍:
Signal Processing: Image Communication is an international journal for the development of the theory and practice of image communication. Its primary objectives are the following:
To present a forum for the advancement of theory and practice of image communication.
To stimulate cross-fertilization between areas similar in nature which have traditionally been separated, for example, various aspects of visual communications and information systems.
To contribute to a rapid information exchange between the industrial and academic environments.
The editorial policy and the technical content of the journal are the responsibility of the Editor-in-Chief, the Area Editors and the Advisory Editors. The Journal is self-supporting from subscription income and contains a minimum amount of advertisements. Advertisements are subject to the prior approval of the Editor-in-Chief. The journal welcomes contributions from every country in the world.
Signal Processing: Image Communication publishes articles relating to aspects of the design, implementation and use of image communication systems. The journal features original research work, tutorial and review articles, and accounts of practical developments.
Subjects of interest include image/video coding, 3D video representations and compression, 3D graphics and animation compression, HDTV and 3DTV systems, video adaptation, video over IP, peer-to-peer video networking, interactive visual communication, multi-user video conferencing, wireless video broadcasting and communication, visual surveillance, 2D and 3D image/video quality measures, pre/post processing, video restoration and super-resolution, multi-camera video analysis, motion analysis, content-based image/video indexing and retrieval, face and gesture processing, video synthesis, 2D and 3D image/video acquisition and display technologies, architectures for image/video processing and communication.