Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion

Xuan Zhou
{"title":"Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion","authors":"Xuan Zhou","doi":"10.3745/JIPS.01.0067","DOIUrl":null,"url":null,"abstract":"Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.","PeriodicalId":415161,"journal":{"name":"J. Inf. Process. Syst.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Inf. Process. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3745/JIPS.01.0067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于时空递归神经网络和特征融合的视频表情识别方法
由于视频中的面部特征与主观情绪之间没有直接的相关性,因此自动识别视频序列中的面部表情是一项具有挑战性的任务。为了克服这一问题,提出了一种基于时空递归神经网络和特征融合的视频面部表情识别方法。首先,对视频进行预处理。然后,采用双层级联结构对视频图像中的人脸进行检测。此外,利用两个深度卷积神经网络提取视频中的时域和空域面部特征。利用空间卷积神经网络从视频中静态表情图像的每一帧中提取空间信息特征。利用时间卷积神经网络从视频中多帧表情图像的光流信息中提取动态信息特征。利用两个深度卷积神经网络学习到的时空特征进行乘法融合。最后,将融合后的特征输入到支持向量机中,实现面部表情分类任务。在cNTERFACE、RML和AFEW6.0数据集上的实验结果表明,该方法的识别率分别高达88.67%、70.32%和63.84%。对比实验表明,该方法比目前报道的其他方法具有更高的识别精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Personalized Web Service Recommendation Method Based on Hybrid Social Network and Multi-Objective Immune Optimization Reference Architecture and Operation Model for PPP (Public-Private-Partnership) Cloud RAVIP: Real-Time AI Vision Platform for Heterogeneous Multi-Channel Video Stream Personalized Product Recommendation Method for Analyzing User Behavior Using DeepFM A Special Section on Deep & Advanced Machine Learning Approaches for Human Behavior Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1