{"title":"基于动作捕捉和增强的行为识别算法","authors":"Yuqi Yang, Jianping Luo","doi":"10.1117/12.2689663","DOIUrl":null,"url":null,"abstract":"Motion modeling and temporal modeling are crucial issues for video behavior recognition. When extracting motion information in two-stream network, the optical flow diagram needs to be calculated in advance and the end-to-end training cannot be realized. 3D CNNs can extract spatiotemporal information, but it requires huge computational resources. To solve these problems, we propose a plug-and-play motion capture and enhancement network (MCE) in this paper, which consists of a temporal motion capture module (TMC) and a multi-scale spatiotemporal enhancement module (MSTE). The TMC module calculates the temporal difference of the feature-level and captures the key motion information in the short temporal range. The MSTE module simulates long-range temporal information by equivalent enlarging the temporal sensitive field through multi-scale hierarchical sub-convolution architecture, and then further enhances the significant motion features by referring to the maxpooling branch. Finally, several experiments are carried out on the behavior recognition standard datasets of Something-Something-V1 and Jester, and the recognition accuracy rates are 49.6% and 96.9%, respectively. Experimental results show that the proposed method is effective and efficient.","PeriodicalId":118234,"journal":{"name":"4th International Conference on Information Science, Electrical and Automation Engineering","volume":"172 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Behavior recognition algorithm based on motion capture and enhancement\",\"authors\":\"Yuqi Yang, Jianping Luo\",\"doi\":\"10.1117/12.2689663\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motion modeling and temporal modeling are crucial issues for video behavior recognition. When extracting motion information in two-stream network, the optical flow diagram needs to be calculated in advance and the end-to-end training cannot be realized. 3D CNNs can extract spatiotemporal information, but it requires huge computational resources. To solve these problems, we propose a plug-and-play motion capture and enhancement network (MCE) in this paper, which consists of a temporal motion capture module (TMC) and a multi-scale spatiotemporal enhancement module (MSTE). The TMC module calculates the temporal difference of the feature-level and captures the key motion information in the short temporal range. The MSTE module simulates long-range temporal information by equivalent enlarging the temporal sensitive field through multi-scale hierarchical sub-convolution architecture, and then further enhances the significant motion features by referring to the maxpooling branch. Finally, several experiments are carried out on the behavior recognition standard datasets of Something-Something-V1 and Jester, and the recognition accuracy rates are 49.6% and 96.9%, respectively. Experimental results show that the proposed method is effective and efficient.\",\"PeriodicalId\":118234,\"journal\":{\"name\":\"4th International Conference on Information Science, Electrical and Automation Engineering\",\"volume\":\"172 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"4th International Conference on Information Science, Electrical and Automation Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2689663\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"4th International Conference on Information Science, Electrical and Automation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2689663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Behavior recognition algorithm based on motion capture and enhancement
Motion modeling and temporal modeling are crucial issues for video behavior recognition. When extracting motion information in two-stream network, the optical flow diagram needs to be calculated in advance and the end-to-end training cannot be realized. 3D CNNs can extract spatiotemporal information, but it requires huge computational resources. To solve these problems, we propose a plug-and-play motion capture and enhancement network (MCE) in this paper, which consists of a temporal motion capture module (TMC) and a multi-scale spatiotemporal enhancement module (MSTE). The TMC module calculates the temporal difference of the feature-level and captures the key motion information in the short temporal range. The MSTE module simulates long-range temporal information by equivalent enlarging the temporal sensitive field through multi-scale hierarchical sub-convolution architecture, and then further enhances the significant motion features by referring to the maxpooling branch. Finally, several experiments are carried out on the behavior recognition standard datasets of Something-Something-V1 and Jester, and the recognition accuracy rates are 49.6% and 96.9%, respectively. Experimental results show that the proposed method is effective and efficient.