{"title":"基于骨骼注意和移位图卷积的人机交互识别","authors":"Jin Zhou, Zhenhua Wang, Jiajun Meng, Sheng Liu, Jianhua Zhang, Shengyong Chen","doi":"10.1109/IJCNN55064.2022.9892292","DOIUrl":null,"url":null,"abstract":"Human interaction recognition has wide applications including intelligent surveillance, intelligent transportation and the analysis of sports videos. In recent years, benefiting from the development of action recognition based on deep learning, the performance of human interaction recognition has been boosted. This paper tackles two vital issues in recognizing human interactions, namely target missing and inadequate feature expression. To this end, we first design a data preprocessing method using skeleton estimation and multi-object tracking, which effectively reduces the chance of missing detection. Second, we propose a two-stream network composing of an appearance branch and a pose branch. The appearance branch extracts features enhanced via part affinity maps and part confidences maps, while the pose branch trains a customized Shift-GCN to extract skeletal features from people-pairs. Appearance and pose features are then fused to generate a more powerful representation of human interactions. Extensive experiments on two existing benchmarks, UT and BIT-Interaction, as well as a new dataset crafted by us, namely Campus-Interaction (CI), demonstrate the superior performance of the proposed approach over the state-of-the-arts.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Human Interaction Recognition with Skeletal Attention and Shift Graph Convolution\",\"authors\":\"Jin Zhou, Zhenhua Wang, Jiajun Meng, Sheng Liu, Jianhua Zhang, Shengyong Chen\",\"doi\":\"10.1109/IJCNN55064.2022.9892292\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human interaction recognition has wide applications including intelligent surveillance, intelligent transportation and the analysis of sports videos. In recent years, benefiting from the development of action recognition based on deep learning, the performance of human interaction recognition has been boosted. This paper tackles two vital issues in recognizing human interactions, namely target missing and inadequate feature expression. To this end, we first design a data preprocessing method using skeleton estimation and multi-object tracking, which effectively reduces the chance of missing detection. Second, we propose a two-stream network composing of an appearance branch and a pose branch. The appearance branch extracts features enhanced via part affinity maps and part confidences maps, while the pose branch trains a customized Shift-GCN to extract skeletal features from people-pairs. Appearance and pose features are then fused to generate a more powerful representation of human interactions. Extensive experiments on two existing benchmarks, UT and BIT-Interaction, as well as a new dataset crafted by us, namely Campus-Interaction (CI), demonstrate the superior performance of the proposed approach over the state-of-the-arts.\",\"PeriodicalId\":106974,\"journal\":{\"name\":\"2022 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"212 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN55064.2022.9892292\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9892292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Human Interaction Recognition with Skeletal Attention and Shift Graph Convolution
Human interaction recognition has wide applications including intelligent surveillance, intelligent transportation and the analysis of sports videos. In recent years, benefiting from the development of action recognition based on deep learning, the performance of human interaction recognition has been boosted. This paper tackles two vital issues in recognizing human interactions, namely target missing and inadequate feature expression. To this end, we first design a data preprocessing method using skeleton estimation and multi-object tracking, which effectively reduces the chance of missing detection. Second, we propose a two-stream network composing of an appearance branch and a pose branch. The appearance branch extracts features enhanced via part affinity maps and part confidences maps, while the pose branch trains a customized Shift-GCN to extract skeletal features from people-pairs. Appearance and pose features are then fused to generate a more powerful representation of human interactions. Extensive experiments on two existing benchmarks, UT and BIT-Interaction, as well as a new dataset crafted by us, namely Campus-Interaction (CI), demonstrate the superior performance of the proposed approach over the state-of-the-arts.