基于动作捕捉和增强的行为识别算法

4th International Conference on Information Science, Electrical and Automation Engineering Pub Date : 2023-08-10 DOI:10.1117/12.2689663

Yuqi Yang, Jianping Luo

{"title":"基于动作捕捉和增强的行为识别算法","authors":"Yuqi Yang, Jianping Luo","doi":"10.1117/12.2689663","DOIUrl":null,"url":null,"abstract":"Motion modeling and temporal modeling are crucial issues for video behavior recognition. When extracting motion information in two-stream network, the optical flow diagram needs to be calculated in advance and the end-to-end training cannot be realized. 3D CNNs can extract spatiotemporal information, but it requires huge computational resources. To solve these problems, we propose a plug-and-play motion capture and enhancement network (MCE) in this paper, which consists of a temporal motion capture module (TMC) and a multi-scale spatiotemporal enhancement module (MSTE). The TMC module calculates the temporal difference of the feature-level and captures the key motion information in the short temporal range. The MSTE module simulates long-range temporal information by equivalent enlarging the temporal sensitive field through multi-scale hierarchical sub-convolution architecture, and then further enhances the significant motion features by referring to the maxpooling branch. Finally, several experiments are carried out on the behavior recognition standard datasets of Something-Something-V1 and Jester, and the recognition accuracy rates are 49.6% and 96.9%, respectively. Experimental results show that the proposed method is effective and efficient.","PeriodicalId":118234,"journal":{"name":"4th International Conference on Information Science, Electrical and Automation Engineering","volume":"172 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Behavior recognition algorithm based on motion capture and enhancement\",\"authors\":\"Yuqi Yang, Jianping Luo\",\"doi\":\"10.1117/12.2689663\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motion modeling and temporal modeling are crucial issues for video behavior recognition. When extracting motion information in two-stream network, the optical flow diagram needs to be calculated in advance and the end-to-end training cannot be realized. 3D CNNs can extract spatiotemporal information, but it requires huge computational resources. To solve these problems, we propose a plug-and-play motion capture and enhancement network (MCE) in this paper, which consists of a temporal motion capture module (TMC) and a multi-scale spatiotemporal enhancement module (MSTE). The TMC module calculates the temporal difference of the feature-level and captures the key motion information in the short temporal range. The MSTE module simulates long-range temporal information by equivalent enlarging the temporal sensitive field through multi-scale hierarchical sub-convolution architecture, and then further enhances the significant motion features by referring to the maxpooling branch. Finally, several experiments are carried out on the behavior recognition standard datasets of Something-Something-V1 and Jester, and the recognition accuracy rates are 49.6% and 96.9%, respectively. Experimental results show that the proposed method is effective and efficient.\",\"PeriodicalId\":118234,\"journal\":{\"name\":\"4th International Conference on Information Science, Electrical and Automation Engineering\",\"volume\":\"172 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"4th International Conference on Information Science, Electrical and Automation Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2689663\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"4th International Conference on Information Science, Electrical and Automation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2689663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

运动建模和时间建模是视频行为识别的关键问题。在双流网络中提取运动信息时，需要提前计算光流图，无法实现端到端的训练。三维cnn可以提取时空信息，但需要大量的计算资源。为了解决这些问题，本文提出了一种即插即用的运动捕捉与增强网络(MCE)，该网络由一个时间运动捕捉模块(TMC)和一个多尺度时空增强模块(MSTE)组成。TMC模块计算特征层的时间差，并捕获短时间范围内的关键运动信息。MSTE模块通过多尺度分层子卷积架构等效放大时间敏感场来模拟远程时间信息，然后参考maxpooling分支进一步增强显著运动特征。最后，在Something-Something-V1和Jester的行为识别标准数据集上进行了多次实验，识别准确率分别达到49.6%和96.9%。实验结果表明，该方法是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Behavior recognition algorithm based on motion capture and enhancement

Motion modeling and temporal modeling are crucial issues for video behavior recognition. When extracting motion information in two-stream network, the optical flow diagram needs to be calculated in advance and the end-to-end training cannot be realized. 3D CNNs can extract spatiotemporal information, but it requires huge computational resources. To solve these problems, we propose a plug-and-play motion capture and enhancement network (MCE) in this paper, which consists of a temporal motion capture module (TMC) and a multi-scale spatiotemporal enhancement module (MSTE). The TMC module calculates the temporal difference of the feature-level and captures the key motion information in the short temporal range. The MSTE module simulates long-range temporal information by equivalent enlarging the temporal sensitive field through multi-scale hierarchical sub-convolution architecture, and then further enhances the significant motion features by referring to the maxpooling branch. Finally, several experiments are carried out on the behavior recognition standard datasets of Something-Something-V1 and Jester, and the recognition accuracy rates are 49.6% and 96.9%, respectively. Experimental results show that the proposed method is effective and efficient.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

4th International Conference on Information Science, Electrical and Automation Engineering

自引率

0.00%

发文量