Haiping Zhang , Xinhao Zhang , Dongjing Wang , Fuxing Zhou , Junfeng Yan
{"title":"MCNet: A unified multi-center graph convolutional network based on skeletal behavior recognition","authors":"Haiping Zhang , Xinhao Zhang , Dongjing Wang , Fuxing Zhou , Junfeng Yan","doi":"10.1016/j.aej.2025.01.118","DOIUrl":null,"url":null,"abstract":"<div><div>The enhanced stability and computational efficiency of skeletal data render it a highly sought-after option for video action recognition. Although some progress has been made in existing research on skeleton behavior recognition based on graph convolutional networks (GCN), the fixation of the graph structure and the lack of interaction of the objects in the dataset with the objects lead to the lack of some flexibility of the traditional model in recognizing actions with a large degree of similarity. This will have an impact on the final performance of the model. To address these issues, we propose a unified multi-center graph convolutional network (MCNet) for skeletal behavior recognition. Some of the actions with a large movement amplitude will result in a change of the human body centers. A multi-center training approach is proposed for the recognition of such actions, in which three centers are defined in the construction of the topology graph. A Multi-Center Data Selector (MCDS) is employed to differentiate and select these centers, thereby enhancing the adaptability of the recognition task. Some of the action categories are easily confused with each other, and in order to facilitate the recognition of actions with high similarity, a multi-modal training scheme is proposed. This employs a large-scale language model as a knowledge engine to provide textual descriptions for global actions in different centers, thus enabling the differentiation of actions and further improvement of the recognition effect. Finally, an attention mechanism module is employed to aggregate the features of a multi-scale adjacency matrix along the channel dimension. In order to verify the effectiveness of the network model proposed in this paper, a series of ablation experiments and model analyses were conducted on three datasets. The model was also compared with other state-of-the-art models, including CTR-GCN, Info-GCN, and STF. The results demonstrated that the model proposed in this paper reached the SOTA level. MCNet outperforms CTR-GCN(Baseline) by 0.6% on X-Sub and 0.3% on X-View on the NTU RGB+D 60 dataset. On the NTU RGB+D 120 dataset, the performance is even more pronounced, with an improvement of up to 0.8% for the X-Sub and X-Set benchmarks, respectively.</div></div>","PeriodicalId":7484,"journal":{"name":"alexandria engineering journal","volume":"120 ","pages":"Pages 116-127"},"PeriodicalIF":6.2000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"alexandria engineering journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110016825001462","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
The enhanced stability and computational efficiency of skeletal data render it a highly sought-after option for video action recognition. Although some progress has been made in existing research on skeleton behavior recognition based on graph convolutional networks (GCN), the fixation of the graph structure and the lack of interaction of the objects in the dataset with the objects lead to the lack of some flexibility of the traditional model in recognizing actions with a large degree of similarity. This will have an impact on the final performance of the model. To address these issues, we propose a unified multi-center graph convolutional network (MCNet) for skeletal behavior recognition. Some of the actions with a large movement amplitude will result in a change of the human body centers. A multi-center training approach is proposed for the recognition of such actions, in which three centers are defined in the construction of the topology graph. A Multi-Center Data Selector (MCDS) is employed to differentiate and select these centers, thereby enhancing the adaptability of the recognition task. Some of the action categories are easily confused with each other, and in order to facilitate the recognition of actions with high similarity, a multi-modal training scheme is proposed. This employs a large-scale language model as a knowledge engine to provide textual descriptions for global actions in different centers, thus enabling the differentiation of actions and further improvement of the recognition effect. Finally, an attention mechanism module is employed to aggregate the features of a multi-scale adjacency matrix along the channel dimension. In order to verify the effectiveness of the network model proposed in this paper, a series of ablation experiments and model analyses were conducted on three datasets. The model was also compared with other state-of-the-art models, including CTR-GCN, Info-GCN, and STF. The results demonstrated that the model proposed in this paper reached the SOTA level. MCNet outperforms CTR-GCN(Baseline) by 0.6% on X-Sub and 0.3% on X-View on the NTU RGB+D 60 dataset. On the NTU RGB+D 120 dataset, the performance is even more pronounced, with an improvement of up to 0.8% for the X-Sub and X-Set benchmarks, respectively.
期刊介绍:
Alexandria Engineering Journal is an international journal devoted to publishing high quality papers in the field of engineering and applied science. Alexandria Engineering Journal is cited in the Engineering Information Services (EIS) and the Chemical Abstracts (CA). The papers published in Alexandria Engineering Journal are grouped into five sections, according to the following classification:
• Mechanical, Production, Marine and Textile Engineering
• Electrical Engineering, Computer Science and Nuclear Engineering
• Civil and Architecture Engineering
• Chemical Engineering and Applied Sciences
• Environmental Engineering