基于单流空间卷积神经网络的手部运动识别

2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI) Pub Date : 2020-10-01 DOI:10.23919/EECSI50503.2020.9251896

Aldi Sidik Permana, E. C. Djamal, Fikri Nugraha, Fatan Kasyidi

{"title":"基于单流空间卷积神经网络的手部运动识别","authors":"Aldi Sidik Permana, E. C. Djamal, Fikri Nugraha, Fatan Kasyidi","doi":"10.23919/EECSI50503.2020.9251896","DOIUrl":null,"url":null,"abstract":"Human-robot interaction can be through several ways, such as through device control, sounds, brain, and body, or hand gesture. There are two main issues: the ability to adapt to extreme settings and the number of frames processed concerning memory capabilities. Although it is necessary to be careful with the selection of the number of frames so as not to burden the memory, this paper proposed identifying hand gesture of video using Spatial Convolutional Neural Networks (CNN). The sequential image's spatial arrangement is extracted from the frames contained in the video so that each frame can be identified as part of one of the hand movements. The research used VGG16, as CNN architecture is concerned with the depth of learning where there are 13 layers of convolution and three layers of identification. Hand gestures can only be identified into four movements, namely ‘right’, ‘left’, ‘grab’, and ‘phone’. Hand gesture identification on the video using Spatial CNN with an initial accuracy of 87.97%, then the second training increased to 98.05%. Accuracy was obtained after training using 5600 training data and 1120 test data, and the improvement occurred after manual noise reduction was performed.","PeriodicalId":6743,"journal":{"name":"2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI)","volume":"28 1","pages":"172-176"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hand Movement Identification Using Single-Stream Spatial Convolutional Neural Networks\",\"authors\":\"Aldi Sidik Permana, E. C. Djamal, Fikri Nugraha, Fatan Kasyidi\",\"doi\":\"10.23919/EECSI50503.2020.9251896\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human-robot interaction can be through several ways, such as through device control, sounds, brain, and body, or hand gesture. There are two main issues: the ability to adapt to extreme settings and the number of frames processed concerning memory capabilities. Although it is necessary to be careful with the selection of the number of frames so as not to burden the memory, this paper proposed identifying hand gesture of video using Spatial Convolutional Neural Networks (CNN). The sequential image's spatial arrangement is extracted from the frames contained in the video so that each frame can be identified as part of one of the hand movements. The research used VGG16, as CNN architecture is concerned with the depth of learning where there are 13 layers of convolution and three layers of identification. Hand gestures can only be identified into four movements, namely ‘right’, ‘left’, ‘grab’, and ‘phone’. Hand gesture identification on the video using Spatial CNN with an initial accuracy of 87.97%, then the second training increased to 98.05%. Accuracy was obtained after training using 5600 training data and 1120 test data, and the improvement occurred after manual noise reduction was performed.\",\"PeriodicalId\":6743,\"journal\":{\"name\":\"2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI)\",\"volume\":\"28 1\",\"pages\":\"172-176\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/EECSI50503.2020.9251896\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/EECSI50503.2020.9251896","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人机交互可以通过几种方式进行，例如通过设备控制、声音、大脑、身体或手势。有两个主要问题:适应极端设置的能力和处理的帧数与内存能力有关。虽然需要注意帧数的选择，以免增加内存负担，但本文提出了使用空间卷积神经网络(CNN)识别视频手势。序列图像的空间排列是从视频中包含的帧中提取出来的，这样每一帧都可以被识别为一个手部运动的一部分。该研究使用了VGG16，因为CNN架构关注的是学习深度，其中有13层卷积和3层识别。手势只能识别为四种动作，即“右”、“左”、“抓”和“打电话”。使用Spatial CNN对视频进行手势识别，初始准确率为87.97%，第二次训练准确率提高到98.05%。使用5600个训练数据和1120个测试数据进行训练后，准确率得到了提高，并且在进行人工降噪后有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Hand Movement Identification Using Single-Stream Spatial Convolutional Neural Networks

Human-robot interaction can be through several ways, such as through device control, sounds, brain, and body, or hand gesture. There are two main issues: the ability to adapt to extreme settings and the number of frames processed concerning memory capabilities. Although it is necessary to be careful with the selection of the number of frames so as not to burden the memory, this paper proposed identifying hand gesture of video using Spatial Convolutional Neural Networks (CNN). The sequential image's spatial arrangement is extracted from the frames contained in the video so that each frame can be identified as part of one of the hand movements. The research used VGG16, as CNN architecture is concerned with the depth of learning where there are 13 layers of convolution and three layers of identification. Hand gestures can only be identified into four movements, namely ‘right’, ‘left’, ‘grab’, and ‘phone’. Hand gesture identification on the video using Spatial CNN with an initial accuracy of 87.97%, then the second training increased to 98.05%. Accuracy was obtained after training using 5600 training data and 1120 test data, and the improvement occurred after manual noise reduction was performed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI)

自引率

0.00%

发文量