Facial expression recognition in videos using hybrid CNN & ConvLSTM.

International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management Pub Date : 2023-01-01 Epub Date: 2023-03-21 DOI:10.1007/s41870-023-01183-0

Rajesh Singh, Sumeet Saurav, Tarun Kumar, Ravi Saini, Anil Vohra, Sanjay Singh

{"title":"Facial expression recognition in videos using hybrid CNN & ConvLSTM.","authors":"Rajesh Singh, Sumeet Saurav, Tarun Kumar, Ravi Saini, Anil Vohra, Sanjay Singh","doi":"10.1007/s41870-023-01183-0","DOIUrl":null,"url":null,"abstract":"<p><p>The three-dimensional convolutional neural network (3D-CNN) and long short-term memory (LSTM) have consistently outperformed many approaches in video-based facial expression recognition (VFER). The image is unrolled to a one-dimensional vector by the vanilla version of the fully-connected LSTM (FC-LSTM), which leads to the loss of crucial spatial information. Convolutional LSTM (ConvLSTM) overcomes this limitation by performing LSTM operations in convolutions without unrolling, thus retaining useful spatial information. Motivated by this, in this paper, we propose a neural network architecture that consists of a blend of 3D-CNN and ConvLSTM for VFER. The proposed hybrid architecture captures spatiotemporal information from the video sequences of emotions and attains competitive accuracy on three FER datasets open to the public, namely the SAVEE, CK + , and AFEW. The experimental results demonstrate excellent performance without external emotional data with the added advantage of having a simple model with fewer parameters. Moreover, unlike the state-of-the-art deep learning models, our designed FER pipeline improves execution speed by many factors while achieving competitive recognition accuracy. Hence, the proposed FER pipeline is an appropriate candidate for recognizing facial expressions on resource-limited embedded platforms for real-time applications.</p>","PeriodicalId":73455,"journal":{"name":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","volume":"15 4","pages":"1819-1830"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10028317/pdf/","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41870-023-01183-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/3/21 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

The three-dimensional convolutional neural network (3D-CNN) and long short-term memory (LSTM) have consistently outperformed many approaches in video-based facial expression recognition (VFER). The image is unrolled to a one-dimensional vector by the vanilla version of the fully-connected LSTM (FC-LSTM), which leads to the loss of crucial spatial information. Convolutional LSTM (ConvLSTM) overcomes this limitation by performing LSTM operations in convolutions without unrolling, thus retaining useful spatial information. Motivated by this, in this paper, we propose a neural network architecture that consists of a blend of 3D-CNN and ConvLSTM for VFER. The proposed hybrid architecture captures spatiotemporal information from the video sequences of emotions and attains competitive accuracy on three FER datasets open to the public, namely the SAVEE, CK + , and AFEW. The experimental results demonstrate excellent performance without external emotional data with the added advantage of having a simple model with fewer parameters. Moreover, unlike the state-of-the-art deep learning models, our designed FER pipeline improves execution speed by many factors while achieving competitive recognition accuracy. Hence, the proposed FER pipeline is an appropriate candidate for recognizing facial expressions on resource-limited embedded platforms for real-time applications.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用混合CNN和ConvLSTM的视频中的面部表情识别。

三维卷积神经网络（3D-CNN）和长短期记忆（LSTM）在基于视频的面部表情识别（VFER）中一直优于许多方法。完全连接LSTM（FC-LSTM）的香草版本将图像展开为一维向量，这导致关键空间信息的丢失。卷积LSTM（ConvLSTM）通过在卷积中执行LSTM操作而不展开来克服这一限制，从而保留有用的空间信息。受此启发，在本文中，我们提出了一种用于VFER的神经网络架构，该架构由3D-CNN和ConvLSTM的混合组成。所提出的混合架构从情绪的视频序列中捕获时空信息，并在向公众开放的三个FER数据集上达到竞争精度，即SAVE、CK + , 和AFEW。实验结果表明，在没有外部情绪数据的情况下，具有出色的性能，同时具有参数较少的简单模型的额外优势。此外，与最先进的深度学习模型不同，我们设计的FER流水线在实现有竞争力的识别精度的同时，通过许多因素提高了执行速度。因此，所提出的FER流水线是在资源有限的嵌入式平台上识别实时应用的面部表情的合适候选者。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

自引率

0.00%

发文量