基于动作捕捉和增强的行为识别算法

Yuqi Yang, Jianping Luo
{"title":"基于动作捕捉和增强的行为识别算法","authors":"Yuqi Yang, Jianping Luo","doi":"10.1117/12.2689663","DOIUrl":null,"url":null,"abstract":"Motion modeling and temporal modeling are crucial issues for video behavior recognition. When extracting motion information in two-stream network, the optical flow diagram needs to be calculated in advance and the end-to-end training cannot be realized. 3D CNNs can extract spatiotemporal information, but it requires huge computational resources. To solve these problems, we propose a plug-and-play motion capture and enhancement network (MCE) in this paper, which consists of a temporal motion capture module (TMC) and a multi-scale spatiotemporal enhancement module (MSTE). The TMC module calculates the temporal difference of the feature-level and captures the key motion information in the short temporal range. The MSTE module simulates long-range temporal information by equivalent enlarging the temporal sensitive field through multi-scale hierarchical sub-convolution architecture, and then further enhances the significant motion features by referring to the maxpooling branch. Finally, several experiments are carried out on the behavior recognition standard datasets of Something-Something-V1 and Jester, and the recognition accuracy rates are 49.6% and 96.9%, respectively. Experimental results show that the proposed method is effective and efficient.","PeriodicalId":118234,"journal":{"name":"4th International Conference on Information Science, Electrical and Automation Engineering","volume":"172 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Behavior recognition algorithm based on motion capture and enhancement\",\"authors\":\"Yuqi Yang, Jianping Luo\",\"doi\":\"10.1117/12.2689663\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motion modeling and temporal modeling are crucial issues for video behavior recognition. When extracting motion information in two-stream network, the optical flow diagram needs to be calculated in advance and the end-to-end training cannot be realized. 3D CNNs can extract spatiotemporal information, but it requires huge computational resources. To solve these problems, we propose a plug-and-play motion capture and enhancement network (MCE) in this paper, which consists of a temporal motion capture module (TMC) and a multi-scale spatiotemporal enhancement module (MSTE). The TMC module calculates the temporal difference of the feature-level and captures the key motion information in the short temporal range. The MSTE module simulates long-range temporal information by equivalent enlarging the temporal sensitive field through multi-scale hierarchical sub-convolution architecture, and then further enhances the significant motion features by referring to the maxpooling branch. Finally, several experiments are carried out on the behavior recognition standard datasets of Something-Something-V1 and Jester, and the recognition accuracy rates are 49.6% and 96.9%, respectively. Experimental results show that the proposed method is effective and efficient.\",\"PeriodicalId\":118234,\"journal\":{\"name\":\"4th International Conference on Information Science, Electrical and Automation Engineering\",\"volume\":\"172 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"4th International Conference on Information Science, Electrical and Automation Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2689663\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"4th International Conference on Information Science, Electrical and Automation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2689663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

运动建模和时间建模是视频行为识别的关键问题。在双流网络中提取运动信息时,需要提前计算光流图,无法实现端到端的训练。三维cnn可以提取时空信息,但需要大量的计算资源。为了解决这些问题,本文提出了一种即插即用的运动捕捉与增强网络(MCE),该网络由一个时间运动捕捉模块(TMC)和一个多尺度时空增强模块(MSTE)组成。TMC模块计算特征层的时间差,并捕获短时间范围内的关键运动信息。MSTE模块通过多尺度分层子卷积架构等效放大时间敏感场来模拟远程时间信息,然后参考maxpooling分支进一步增强显著运动特征。最后,在Something-Something-V1和Jester的行为识别标准数据集上进行了多次实验,识别准确率分别达到49.6%和96.9%。实验结果表明,该方法是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Behavior recognition algorithm based on motion capture and enhancement
Motion modeling and temporal modeling are crucial issues for video behavior recognition. When extracting motion information in two-stream network, the optical flow diagram needs to be calculated in advance and the end-to-end training cannot be realized. 3D CNNs can extract spatiotemporal information, but it requires huge computational resources. To solve these problems, we propose a plug-and-play motion capture and enhancement network (MCE) in this paper, which consists of a temporal motion capture module (TMC) and a multi-scale spatiotemporal enhancement module (MSTE). The TMC module calculates the temporal difference of the feature-level and captures the key motion information in the short temporal range. The MSTE module simulates long-range temporal information by equivalent enlarging the temporal sensitive field through multi-scale hierarchical sub-convolution architecture, and then further enhances the significant motion features by referring to the maxpooling branch. Finally, several experiments are carried out on the behavior recognition standard datasets of Something-Something-V1 and Jester, and the recognition accuracy rates are 49.6% and 96.9%, respectively. Experimental results show that the proposed method is effective and efficient.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A smart brain controlled wheelchair based on TGAM Multi-direction prediction based on SALSTM model for ship motion Study on heart disease prediction based on SVM-GBDT hybrid model Research on intelligent monitoring of roof distributed photovoltaics based on high-reliable power line and wireless communication Design of low-power acceleration processor for convolutional neural networks based on RISC-V
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1