Multi-temporal scale aggregation refinement graph convolutional network for skeleton-based action recognition

IF 0.9 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Computer Animation and Virtual Worlds Pub Date : 2023-09-25 DOI:10.1002/cav.2221

Xuanfeng Li, Jian Lu, Jian Zhou, Wei Liu, Kaibing Zhang

{"title":"Multi-temporal scale aggregation refinement graph convolutional network for skeleton-based action recognition","authors":"Xuanfeng Li, Jian Lu, Jian Zhou, Wei Liu, Kaibing Zhang","doi":"10.1002/cav.2221","DOIUrl":null,"url":null,"abstract":"<p>Skeleton-based human action recognition is gaining significant attention and finding widespread application in various fields, such as virtual reality and human-computer interaction systems. Recent studies have highlighted the effectiveness of graph convolutional network (GCN) based methods in this task, leading to a remarkable improvement in prediction accuracy. However, most GCN-based methods overlook the varying contributions of self, centripetal and centrifugal subsets. Besides, only a single-scale temporal feature is adopted, and the multi-temporal scale information is ignored. To this end, firstly, in order to differentiate the importance of different skeleton subsets, we develop a refinement graph convolution, which can adaptively learn a weight for each subset feature. Secondly, a multi-temporal scale aggregation module is proposed to extract more discriminative temporal dynamic information. Furthermore, a multi-temporal scale aggregation refinement graph convolutional network (MTSA-RGCN) is proposed, and four-stream structure is also adopted in this paper, which can comprehensively model complementary features and eventually achieves a significant performance boost. In the empirical experiments, the performance of our approach has been greatly improved on both NTU-RGB+D 60 and NTU-RGB+D 120 datasets, compared to other state-of-the-art methods.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Animation and Virtual Worlds","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cav.2221","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Skeleton-based human action recognition is gaining significant attention and finding widespread application in various fields, such as virtual reality and human-computer interaction systems. Recent studies have highlighted the effectiveness of graph convolutional network (GCN) based methods in this task, leading to a remarkable improvement in prediction accuracy. However, most GCN-based methods overlook the varying contributions of self, centripetal and centrifugal subsets. Besides, only a single-scale temporal feature is adopted, and the multi-temporal scale information is ignored. To this end, firstly, in order to differentiate the importance of different skeleton subsets, we develop a refinement graph convolution, which can adaptively learn a weight for each subset feature. Secondly, a multi-temporal scale aggregation module is proposed to extract more discriminative temporal dynamic information. Furthermore, a multi-temporal scale aggregation refinement graph convolutional network (MTSA-RGCN) is proposed, and four-stream structure is also adopted in this paper, which can comprehensively model complementary features and eventually achieves a significant performance boost. In the empirical experiments, the performance of our approach has been greatly improved on both NTU-RGB+D 60 and NTU-RGB+D 120 datasets, compared to other state-of-the-art methods.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于骨骼动作识别的多时标聚合细化图卷积网络

基于骨架的人类动作识别越来越受到重视，并在虚拟现实和人机交互系统等多个领域得到广泛应用。最近的研究强调了基于图卷积网络（GCN）的方法在这项任务中的有效性，从而显著提高了预测精度。然而，大多数基于 GCN 的方法都忽略了自身、向心和离心子集的不同贡献。此外，还只采用了单一尺度的时间特征，忽略了多时标信息。为此，首先，为了区分不同骨架子集的重要性，我们开发了一种细化图卷积，可以自适应地学习每个子集特征的权重。其次，我们提出了一个多时标聚合模块，以提取更具区分性的时间动态信息。此外，本文还提出了一种多时标聚合细化图卷积网络（MTSA-RGCN），并采用了四流结构，可以对互补特征进行综合建模，最终实现了性能的显著提升。在实证实验中，与其他最先进的方法相比，我们的方法在 NTU-RGB+D 60 和 NTU-RGB+D 120 数据集上的性能都有很大提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computer Animation and Virtual Worlds 工程技术-计算机：软件工程

CiteScore

2.20

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： With the advent of very powerful PCs and high-end graphics cards, there has been an incredible development in Virtual Worlds, real-time computer animation and simulation, games. But at the same time, new and cheaper Virtual Reality devices have appeared allowing an interaction with these real-time Virtual Worlds and even with real worlds through Augmented Reality. Three-dimensional characters, especially Virtual Humans are now of an exceptional quality, which allows to use them in the movie industry. But this is only a beginning, as with the development of Artificial Intelligence and Agent technology, these characters will become more and more autonomous and even intelligent. They will inhabit the Virtual Worlds in a Virtual Life together with animals and plants.

期刊最新文献

Diverse Motions and Responses in Crowd Simulation A Facial Motion Retargeting Pipeline for Appearance Agnostic 3D Characters Enhancing Front-End Security: Protecting User Data and Privacy in Web Applications Virtual Roaming of Cultural Heritage Based on Image Processing PainterAR: A Self-Painting AR Interface for Mobile Devices