基于深度张量压缩LSTM神经网络的移动设备快速视频面部表情识别

IF 3.5 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Internet of Things Pub Date : 2021-07-15 DOI:10.1145/3464941

Peining Zhen, Hai-Bao Chen, Yuan Cheng, Zhigang Ji, Bin Liu, Hao Yu

{"title":"基于深度张量压缩LSTM神经网络的移动设备快速视频面部表情识别","authors":"Peining Zhen, Hai-Bao Chen, Yuan Cheng, Zhigang Ji, Bin Liu, Hao Yu","doi":"10.1145/3464941","DOIUrl":null,"url":null,"abstract":"Mobile devices usually suffer from limited computation and storage resources, which seriously hinders them from deep neural network applications. In this article, we introduce a deeply tensor-compressed long short-term memory (LSTM) neural network for fast video-based facial expression recognition on mobile devices. First, a spatio-temporal facial expression recognition LSTM model is built by extracting time-series feature maps from facial clips. The LSTM-based spatio-temporal model is further deeply compressed by means of quantization and tensorization for mobile device implementation. Based on datasets of Extended Cohn-Kanade (CK+), MMI, and Acted Facial Expression in Wild 7.0, experimental results show that the proposed method achieves 97.96%, 97.33%, and 55.60% classification accuracy and significantly compresses the size of network model up to 221× with reduced training time per epoch by 60%. Our work is further implemented on the RK3399Pro mobile device with a Neural Process Engine. The latency of the feature extractor and LSTM predictor can be reduced 30.20× and 6.62× , respectively, on board with the leveraged compression methods. Furthermore, the spatio-temporal model costs only 57.19 MB of DRAM and 5.67W of power when running on the board.","PeriodicalId":29764,"journal":{"name":"ACM Transactions on Internet of Things","volume":"13 1","pages":"1 - 26"},"PeriodicalIF":3.5000,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Fast Video Facial Expression Recognition by a Deeply Tensor-Compressed LSTM Neural Network for Mobile Devices\",\"authors\":\"Peining Zhen, Hai-Bao Chen, Yuan Cheng, Zhigang Ji, Bin Liu, Hao Yu\",\"doi\":\"10.1145/3464941\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mobile devices usually suffer from limited computation and storage resources, which seriously hinders them from deep neural network applications. In this article, we introduce a deeply tensor-compressed long short-term memory (LSTM) neural network for fast video-based facial expression recognition on mobile devices. First, a spatio-temporal facial expression recognition LSTM model is built by extracting time-series feature maps from facial clips. The LSTM-based spatio-temporal model is further deeply compressed by means of quantization and tensorization for mobile device implementation. Based on datasets of Extended Cohn-Kanade (CK+), MMI, and Acted Facial Expression in Wild 7.0, experimental results show that the proposed method achieves 97.96%, 97.33%, and 55.60% classification accuracy and significantly compresses the size of network model up to 221× with reduced training time per epoch by 60%. Our work is further implemented on the RK3399Pro mobile device with a Neural Process Engine. The latency of the feature extractor and LSTM predictor can be reduced 30.20× and 6.62× , respectively, on board with the leveraged compression methods. Furthermore, the spatio-temporal model costs only 57.19 MB of DRAM and 5.67W of power when running on the board.\",\"PeriodicalId\":29764,\"journal\":{\"name\":\"ACM Transactions on Internet of Things\",\"volume\":\"13 1\",\"pages\":\"1 - 26\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2021-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Internet of Things\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3464941\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3464941","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 8

摘要

移动设备通常受限于有限的计算和存储资源，这严重阻碍了深度神经网络的应用。在本文中，我们介绍了一种深度张量压缩的长短期记忆(LSTM)神经网络，用于移动设备上基于视频的快速面部表情识别。首先，通过提取人脸片段的时间序列特征映射，建立时空面部表情识别LSTM模型;基于lstm的时空模型通过量化和张张化进一步深度压缩，以便移动设备实现。基于Wild 7.0的扩展科恩-卡纳德(CK+)、MMI和动作面部表情数据集，实验结果表明，该方法的分类准确率分别达到97.96%、97.33%和55.60%，网络模型的大小显著压缩到221x，每个历元的训练时间减少了60%。我们的工作在带有神经处理引擎的RK3399Pro移动设备上进一步实现。使用杠杆压缩方法，特征提取器和LSTM预测器的延迟可以分别减少30.20倍和6.62倍。此外，时空模型在板上运行时仅消耗57.19 MB的DRAM和5.67W的功耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fast Video Facial Expression Recognition by a Deeply Tensor-Compressed LSTM Neural Network for Mobile Devices

Mobile devices usually suffer from limited computation and storage resources, which seriously hinders them from deep neural network applications. In this article, we introduce a deeply tensor-compressed long short-term memory (LSTM) neural network for fast video-based facial expression recognition on mobile devices. First, a spatio-temporal facial expression recognition LSTM model is built by extracting time-series feature maps from facial clips. The LSTM-based spatio-temporal model is further deeply compressed by means of quantization and tensorization for mobile device implementation. Based on datasets of Extended Cohn-Kanade (CK+), MMI, and Acted Facial Expression in Wild 7.0, experimental results show that the proposed method achieves 97.96%, 97.33%, and 55.60% classification accuracy and significantly compresses the size of network model up to 221× with reduced training time per epoch by 60%. Our work is further implemented on the RK3399Pro mobile device with a Neural Process Engine. The latency of the feature extractor and LSTM predictor can be reduced 30.20× and 6.62× , respectively, on board with the leveraged compression methods. Furthermore, the spatio-temporal model costs only 57.19 MB of DRAM and 5.67W of power when running on the board.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊