Tingting Cai, Xueqin Jiang, Shubo Zhou, Yongguo Li, Yi Yang
{"title":"基于骨架的动作识别的密集连接多时间图卷积网络","authors":"Tingting Cai, Xueqin Jiang, Shubo Zhou, Yongguo Li, Yi Yang","doi":"10.1109/CCISP55629.2022.9974367","DOIUrl":null,"url":null,"abstract":"More and more researchers are devoting themselves to skeleton-based action recognition owing to its high research value. Due to the property of the background suppression and the natural topological graph structure, most of the current researches based on the skeleton graphs construct spatial-temporal graph convolutions. However, due to the forward propagation of the network, the semantic features from joints and bones in the shallow layers may be dispersed in the long diffusion process. To make better utilization of the semantic feature information, we proposed a densely connected and multiple temporal graph convolution network (SMT-DGCN), which fully utilizes the features of each layer by introducing the dense connectivity mechanism into the ST-GCN network, and uses multiple temporal convolution to extract discriminative temporal motion features. Compared to traditional GCNs, our network architecture has the following two innovative advantages: 1) By densely connecting each layer to the semantic features, we are able to reuse features and improve feature utilization compared to the base network. 2) In the temporal modeling stage, the multiple temporal convolution module is employed, which can enrich and refine the temporal features. Experiments on the NTU-RGBD dataset demonstrate that our proposed model outperforms most existing studies.","PeriodicalId":431851,"journal":{"name":"2022 7th International Conference on Communication, Image and Signal Processing (CCISP)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Densely Connected and Multiple Temporal Graph Convolution Networks for Skeleton-based Action Recognition\",\"authors\":\"Tingting Cai, Xueqin Jiang, Shubo Zhou, Yongguo Li, Yi Yang\",\"doi\":\"10.1109/CCISP55629.2022.9974367\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"More and more researchers are devoting themselves to skeleton-based action recognition owing to its high research value. Due to the property of the background suppression and the natural topological graph structure, most of the current researches based on the skeleton graphs construct spatial-temporal graph convolutions. However, due to the forward propagation of the network, the semantic features from joints and bones in the shallow layers may be dispersed in the long diffusion process. To make better utilization of the semantic feature information, we proposed a densely connected and multiple temporal graph convolution network (SMT-DGCN), which fully utilizes the features of each layer by introducing the dense connectivity mechanism into the ST-GCN network, and uses multiple temporal convolution to extract discriminative temporal motion features. Compared to traditional GCNs, our network architecture has the following two innovative advantages: 1) By densely connecting each layer to the semantic features, we are able to reuse features and improve feature utilization compared to the base network. 2) In the temporal modeling stage, the multiple temporal convolution module is employed, which can enrich and refine the temporal features. Experiments on the NTU-RGBD dataset demonstrate that our proposed model outperforms most existing studies.\",\"PeriodicalId\":431851,\"journal\":{\"name\":\"2022 7th International Conference on Communication, Image and Signal Processing (CCISP)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 7th International Conference on Communication, Image and Signal Processing (CCISP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCISP55629.2022.9974367\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Communication, Image and Signal Processing (CCISP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCISP55629.2022.9974367","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Densely Connected and Multiple Temporal Graph Convolution Networks for Skeleton-based Action Recognition
More and more researchers are devoting themselves to skeleton-based action recognition owing to its high research value. Due to the property of the background suppression and the natural topological graph structure, most of the current researches based on the skeleton graphs construct spatial-temporal graph convolutions. However, due to the forward propagation of the network, the semantic features from joints and bones in the shallow layers may be dispersed in the long diffusion process. To make better utilization of the semantic feature information, we proposed a densely connected and multiple temporal graph convolution network (SMT-DGCN), which fully utilizes the features of each layer by introducing the dense connectivity mechanism into the ST-GCN network, and uses multiple temporal convolution to extract discriminative temporal motion features. Compared to traditional GCNs, our network architecture has the following two innovative advantages: 1) By densely connecting each layer to the semantic features, we are able to reuse features and improve feature utilization compared to the base network. 2) In the temporal modeling stage, the multiple temporal convolution module is employed, which can enrich and refine the temporal features. Experiments on the NTU-RGBD dataset demonstrate that our proposed model outperforms most existing studies.