基于骨骼动作识别的多时标聚合细化图卷积网络

IF 3.4 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Computer Animation and Virtual Worlds Pub Date : 2023-09-25 DOI:10.1002/cav.2221

Xuanfeng Li, Jian Lu, Jian Zhou, Wei Liu, Kaibing Zhang

{"title":"基于骨骼动作识别的多时标聚合细化图卷积网络","authors":"Xuanfeng Li, Jian Lu, Jian Zhou, Wei Liu, Kaibing Zhang","doi":"10.1002/cav.2221","DOIUrl":null,"url":null,"abstract":"<p>Skeleton-based human action recognition is gaining significant attention and finding widespread application in various fields, such as virtual reality and human-computer interaction systems. Recent studies have highlighted the effectiveness of graph convolutional network (GCN) based methods in this task, leading to a remarkable improvement in prediction accuracy. However, most GCN-based methods overlook the varying contributions of self, centripetal and centrifugal subsets. Besides, only a single-scale temporal feature is adopted, and the multi-temporal scale information is ignored. To this end, firstly, in order to differentiate the importance of different skeleton subsets, we develop a refinement graph convolution, which can adaptively learn a weight for each subset feature. Secondly, a multi-temporal scale aggregation module is proposed to extract more discriminative temporal dynamic information. Furthermore, a multi-temporal scale aggregation refinement graph convolutional network (MTSA-RGCN) is proposed, and four-stream structure is also adopted in this paper, which can comprehensively model complementary features and eventually achieves a significant performance boost. In the empirical experiments, the performance of our approach has been greatly improved on both NTU-RGB+D 60 and NTU-RGB+D 120 datasets, compared to other state-of-the-art methods.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 1","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-temporal scale aggregation refinement graph convolutional network for skeleton-based action recognition\",\"authors\":\"Xuanfeng Li, Jian Lu, Jian Zhou, Wei Liu, Kaibing Zhang\",\"doi\":\"10.1002/cav.2221\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Skeleton-based human action recognition is gaining significant attention and finding widespread application in various fields, such as virtual reality and human-computer interaction systems. Recent studies have highlighted the effectiveness of graph convolutional network (GCN) based methods in this task, leading to a remarkable improvement in prediction accuracy. However, most GCN-based methods overlook the varying contributions of self, centripetal and centrifugal subsets. Besides, only a single-scale temporal feature is adopted, and the multi-temporal scale information is ignored. To this end, firstly, in order to differentiate the importance of different skeleton subsets, we develop a refinement graph convolution, which can adaptively learn a weight for each subset feature. Secondly, a multi-temporal scale aggregation module is proposed to extract more discriminative temporal dynamic information. Furthermore, a multi-temporal scale aggregation refinement graph convolutional network (MTSA-RGCN) is proposed, and four-stream structure is also adopted in this paper, which can comprehensively model complementary features and eventually achieves a significant performance boost. In the empirical experiments, the performance of our approach has been greatly improved on both NTU-RGB+D 60 and NTU-RGB+D 120 datasets, compared to other state-of-the-art methods.</p>\",\"PeriodicalId\":50645,\"journal\":{\"name\":\"Computer Animation and Virtual Worlds\",\"volume\":\"35 1\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2023-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Animation and Virtual Worlds\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cav.2221\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Animation and Virtual Worlds","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cav.2221","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

基于骨架的人类动作识别越来越受到重视，并在虚拟现实和人机交互系统等多个领域得到广泛应用。最近的研究强调了基于图卷积网络（GCN）的方法在这项任务中的有效性，从而显著提高了预测精度。然而，大多数基于 GCN 的方法都忽略了自身、向心和离心子集的不同贡献。此外，还只采用了单一尺度的时间特征，忽略了多时标信息。为此，首先，为了区分不同骨架子集的重要性，我们开发了一种细化图卷积，可以自适应地学习每个子集特征的权重。其次，我们提出了一个多时标聚合模块，以提取更具区分性的时间动态信息。此外，本文还提出了一种多时标聚合细化图卷积网络（MTSA-RGCN），并采用了四流结构，可以对互补特征进行综合建模，最终实现了性能的显著提升。在实证实验中，与其他最先进的方法相比，我们的方法在 NTU-RGB+D 60 和 NTU-RGB+D 120 数据集上的性能都有很大提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multi-temporal scale aggregation refinement graph convolutional network for skeleton-based action recognition

Skeleton-based human action recognition is gaining significant attention and finding widespread application in various fields, such as virtual reality and human-computer interaction systems. Recent studies have highlighted the effectiveness of graph convolutional network (GCN) based methods in this task, leading to a remarkable improvement in prediction accuracy. However, most GCN-based methods overlook the varying contributions of self, centripetal and centrifugal subsets. Besides, only a single-scale temporal feature is adopted, and the multi-temporal scale information is ignored. To this end, firstly, in order to differentiate the importance of different skeleton subsets, we develop a refinement graph convolution, which can adaptively learn a weight for each subset feature. Secondly, a multi-temporal scale aggregation module is proposed to extract more discriminative temporal dynamic information. Furthermore, a multi-temporal scale aggregation refinement graph convolutional network (MTSA-RGCN) is proposed, and four-stream structure is also adopted in this paper, which can comprehensively model complementary features and eventually achieves a significant performance boost. In the empirical experiments, the performance of our approach has been greatly improved on both NTU-RGB+D 60 and NTU-RGB+D 120 datasets, compared to other state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Animation and Virtual Worlds 工程技术-计算机：软件工程

CiteScore

2.20

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： With the advent of very powerful PCs and high-end graphics cards, there has been an incredible development in Virtual Worlds, real-time computer animation and simulation, games. But at the same time, new and cheaper Virtual Reality devices have appeared allowing an interaction with these real-time Virtual Worlds and even with real worlds through Augmented Reality. Three-dimensional characters, especially Virtual Humans are now of an exceptional quality, which allows to use them in the movie industry. But this is only a beginning, as with the development of Artificial Intelligence and Agent technology, these characters will become more and more autonomous and even intelligent. They will inhabit the Virtual Worlds in a Virtual Life together with animals and plants.