MCNet: A unified multi-center graph convolutional network based on skeletal behavior recognition

IF 6.8 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY alexandria engineering journal Pub Date : 2025-02-12 DOI:10.1016/j.aej.2025.01.118
Haiping Zhang , Xinhao Zhang , Dongjing Wang , Fuxing Zhou , Junfeng Yan
{"title":"MCNet: A unified multi-center graph convolutional network based on skeletal behavior recognition","authors":"Haiping Zhang ,&nbsp;Xinhao Zhang ,&nbsp;Dongjing Wang ,&nbsp;Fuxing Zhou ,&nbsp;Junfeng Yan","doi":"10.1016/j.aej.2025.01.118","DOIUrl":null,"url":null,"abstract":"<div><div>The enhanced stability and computational efficiency of skeletal data render it a highly sought-after option for video action recognition. Although some progress has been made in existing research on skeleton behavior recognition based on graph convolutional networks (GCN), the fixation of the graph structure and the lack of interaction of the objects in the dataset with the objects lead to the lack of some flexibility of the traditional model in recognizing actions with a large degree of similarity. This will have an impact on the final performance of the model. To address these issues, we propose a unified multi-center graph convolutional network (MCNet) for skeletal behavior recognition. Some of the actions with a large movement amplitude will result in a change of the human body centers. A multi-center training approach is proposed for the recognition of such actions, in which three centers are defined in the construction of the topology graph. A Multi-Center Data Selector (MCDS) is employed to differentiate and select these centers, thereby enhancing the adaptability of the recognition task. Some of the action categories are easily confused with each other, and in order to facilitate the recognition of actions with high similarity, a multi-modal training scheme is proposed. This employs a large-scale language model as a knowledge engine to provide textual descriptions for global actions in different centers, thus enabling the differentiation of actions and further improvement of the recognition effect. Finally, an attention mechanism module is employed to aggregate the features of a multi-scale adjacency matrix along the channel dimension. In order to verify the effectiveness of the network model proposed in this paper, a series of ablation experiments and model analyses were conducted on three datasets. The model was also compared with other state-of-the-art models, including CTR-GCN, Info-GCN, and STF. The results demonstrated that the model proposed in this paper reached the SOTA level. MCNet outperforms CTR-GCN(Baseline) by 0.6% on X-Sub and 0.3% on X-View on the NTU RGB+D 60 dataset. On the NTU RGB+D 120 dataset, the performance is even more pronounced, with an improvement of up to 0.8% for the X-Sub and X-Set benchmarks, respectively.</div></div>","PeriodicalId":7484,"journal":{"name":"alexandria engineering journal","volume":"120 ","pages":"Pages 116-127"},"PeriodicalIF":6.8000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"alexandria engineering journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110016825001462","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

The enhanced stability and computational efficiency of skeletal data render it a highly sought-after option for video action recognition. Although some progress has been made in existing research on skeleton behavior recognition based on graph convolutional networks (GCN), the fixation of the graph structure and the lack of interaction of the objects in the dataset with the objects lead to the lack of some flexibility of the traditional model in recognizing actions with a large degree of similarity. This will have an impact on the final performance of the model. To address these issues, we propose a unified multi-center graph convolutional network (MCNet) for skeletal behavior recognition. Some of the actions with a large movement amplitude will result in a change of the human body centers. A multi-center training approach is proposed for the recognition of such actions, in which three centers are defined in the construction of the topology graph. A Multi-Center Data Selector (MCDS) is employed to differentiate and select these centers, thereby enhancing the adaptability of the recognition task. Some of the action categories are easily confused with each other, and in order to facilitate the recognition of actions with high similarity, a multi-modal training scheme is proposed. This employs a large-scale language model as a knowledge engine to provide textual descriptions for global actions in different centers, thus enabling the differentiation of actions and further improvement of the recognition effect. Finally, an attention mechanism module is employed to aggregate the features of a multi-scale adjacency matrix along the channel dimension. In order to verify the effectiveness of the network model proposed in this paper, a series of ablation experiments and model analyses were conducted on three datasets. The model was also compared with other state-of-the-art models, including CTR-GCN, Info-GCN, and STF. The results demonstrated that the model proposed in this paper reached the SOTA level. MCNet outperforms CTR-GCN(Baseline) by 0.6% on X-Sub and 0.3% on X-View on the NTU RGB+D 60 dataset. On the NTU RGB+D 120 dataset, the performance is even more pronounced, with an improvement of up to 0.8% for the X-Sub and X-Set benchmarks, respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MCNet:基于骨骼行为识别的统一多中心图卷积网络
骨骼数据增强的稳定性和计算效率使其成为视频动作识别的一个非常受欢迎的选择。尽管现有的基于图卷积网络(GCN)的骨架行为识别研究取得了一定的进展,但由于图结构的固定以及数据集中对象与对象之间缺乏交互,导致传统模型在识别高度相似的动作时缺乏一定的灵活性。这将对模型的最终性能产生影响。为了解决这些问题,我们提出了一个统一的多中心图卷积网络(MCNet)用于骨骼行为识别。一些运动幅度较大的动作会引起人体重心的变化。提出了一种多中心训练方法,该方法在拓扑图的构造中定义了三个中心。采用多中心数据选择器(Multi-Center Data Selector, MCDS)对这些中心进行区分和选择,从而增强了识别任务的适应性。针对一些动作类别容易相互混淆的问题,提出了一种多模态训练方案,便于对相似度高的动作进行识别。该方法采用大规模的语言模型作为知识引擎,为不同中心的全局动作提供文本描述,从而实现动作的区分,进一步提高识别效果。最后,利用注意机制模块对多尺度邻接矩阵沿通道维度的特征进行聚合。为了验证本文提出的网络模型的有效性,在三个数据集上进行了一系列烧蚀实验和模型分析。该模型还与其他最先进的模型进行了比较,包括cr - gcn、Info-GCN和STF。结果表明,本文提出的模型达到了SOTA水平。在NTU RGB+ d60数据集上,MCNet在X-Sub和X-View上的性能分别比cr - gcn (Baseline)高0.6%和0.3%。在NTU RGB+D 120数据集上,性能更加明显,X-Sub和X-Set基准分别提高了0.8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
alexandria engineering journal
alexandria engineering journal Engineering-General Engineering
CiteScore
11.20
自引率
4.40%
发文量
1015
审稿时长
43 days
期刊介绍: Alexandria Engineering Journal is an international journal devoted to publishing high quality papers in the field of engineering and applied science. Alexandria Engineering Journal is cited in the Engineering Information Services (EIS) and the Chemical Abstracts (CA). The papers published in Alexandria Engineering Journal are grouped into five sections, according to the following classification: • Mechanical, Production, Marine and Textile Engineering • Electrical Engineering, Computer Science and Nuclear Engineering • Civil and Architecture Engineering • Chemical Engineering and Applied Sciences • Environmental Engineering
期刊最新文献
An edge-available defect detection And Localization Flow Model Numerical treatments of (IntVarFrac) order partial differential equations for cancer tumor disease based on non-singular kernel Predicting the condition of small and medium-span bridges using hybrid machine learning Efficient human pose estimation in complex coal mining scenes via Keypoint Partitioning Adaptive Convolution Autonomous aerial pipeline detection and tracking using YOLOv8 and real-time control algorithms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1