基于3D-CNN和LSTM网络的动态手势识别

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Cmc-computers Materials & Continua Pub Date : 2022-01-01 DOI:10.32604/cmc.2022.019586

Muneeb Ur Rehman, Fawad Ahmed, Muhammad Attique Khan, U. Tariq, Faisal Abdulaziz Alfouzan, Nouf M. Alzahrani, Jawad Ahmad

{"title":"基于3D-CNN和LSTM网络的动态手势识别","authors":"Muneeb Ur Rehman, Fawad Ahmed, Muhammad Attique Khan, U. Tariq, Faisal Abdulaziz Alfouzan, Nouf M. Alzahrani, Jawad Ahmad","doi":"10.32604/cmc.2022.019586","DOIUrl":null,"url":null,"abstract":": Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream. Many researchers have been working on vision-based gesture recognition due to its various applications. This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network (3D-CNN) and a Long Short-Term Memory (LSTM) network. The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation. The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out. The proposed model is a light-weight architecture with only 3.7 million training parameters. The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly. The model was trained on 2000 video-clips per class which were separated into 80% training and 20% validation sets. An accuracy of 99% and 97% was achieved on training and testing data, respectively. We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2 + LSTM.","PeriodicalId":10440,"journal":{"name":"Cmc-computers Materials & Continua","volume":"16 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Dynamic Hand Gesture Recognition Using 3D-CNN and LSTM Networks\",\"authors\":\"Muneeb Ur Rehman, Fawad Ahmed, Muhammad Attique Khan, U. Tariq, Faisal Abdulaziz Alfouzan, Nouf M. Alzahrani, Jawad Ahmad\",\"doi\":\"10.32604/cmc.2022.019586\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream. Many researchers have been working on vision-based gesture recognition due to its various applications. This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network (3D-CNN) and a Long Short-Term Memory (LSTM) network. The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation. The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out. The proposed model is a light-weight architecture with only 3.7 million training parameters. The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly. The model was trained on 2000 video-clips per class which were separated into 80% training and 20% validation sets. An accuracy of 99% and 97% was achieved on training and testing data, respectively. We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2 + LSTM.\",\"PeriodicalId\":10440,\"journal\":{\"name\":\"Cmc-computers Materials & Continua\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cmc-computers Materials & Continua\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.32604/cmc.2022.019586\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cmc-computers Materials & Continua","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.32604/cmc.2022.019586","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 19

摘要

实时识别动态手势是一项艰巨的任务，因为系统永远无法知道视频流中手势的开始和结束时间或地点。由于基于视觉的手势识别应用广泛，许多研究者一直在研究基于视觉的手势识别。本文提出了一种基于3D卷积神经网络(3D- cnn)和长短期记忆(LSTM)网络相结合的深度学习架构。该架构从输入的视频序列中提取时空信息，同时避免了大量的计算。3D-CNN用于提取光谱和空间特征，然后将其提供给LSTM网络，通过LSTM网络进行分类。该模型是一个轻量级的体系结构，只有370万个训练参数。该模型已经在公开可用的200 bn -jester数据集中的15个类上进行了评估。该模型在每个类2000个视频片段上进行训练，这些视频片段被分成80%的训练集和20%的验证集。训练和测试数据的准确率分别达到99%和97%。我们进一步表明，与MobileNetv2 + LSTM相比，3D-CNN与LSTM的结合具有更好的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Dynamic Hand Gesture Recognition Using 3D-CNN and LSTM Networks

: Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream. Many researchers have been working on vision-based gesture recognition due to its various applications. This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network (3D-CNN) and a Long Short-Term Memory (LSTM) network. The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation. The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out. The proposed model is a light-weight architecture with only 3.7 million training parameters. The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly. The model was trained on 2000 video-clips per class which were separated into 80% training and 20% validation sets. An accuracy of 99% and 97% was achieved on training and testing data, respectively. We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2 + LSTM.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cmc-computers Materials & Continua 工程技术-材料科学：综合

CiteScore

5.30

自引率

19.40%

发文量

345

审稿时长

1 months

期刊介绍： This journal publishes original research papers in the areas of computer networks, artificial intelligence, big data management, software engineering, multimedia, cyber security, internet of things, materials genome, integrated materials science, data analysis, modeling, and engineering of designing and manufacturing of modern functional and multifunctional materials. Novel high performance computing methods, big data analysis, and artificial intelligence that advance material technologies are especially welcome.