Beyond-Skeleton: Zero-shot Skeleton Action Recognition enhanced by supplementary RGB visual information

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2025-05-10 Epub Date: 2025-02-15 DOI:10.1016/j.eswa.2025.126814

Hongjie Liu , Yingchun Niu , Kun Zeng , Chun Liu , Mengjie Hu , Qing Song

{"title":"Beyond-Skeleton: Zero-shot Skeleton Action Recognition enhanced by supplementary RGB visual information","authors":"Hongjie Liu , Yingchun Niu , Kun Zeng , Chun Liu , Mengjie Hu , Qing Song","doi":"10.1016/j.eswa.2025.126814","DOIUrl":null,"url":null,"abstract":"<div><div>Zero-shot action recognition (ZSAR) recognizes action categories that have not appeared during the training process and has garnered widespread attention due to its potential to save costs in retraining and data annotation. We observed that the existing ZSAR method based on skeleton sequences only uses human posture information in the skeleton sequence, lacks discriminative semantic representation in some similar behavior recognition, and lacks effective interaction between different modalities, resulting in unsatisfactory performance and limited applications of the ZSAR. To solve these problems, we propose a novel method, called Beyond-Skeleton zero-shot Learning (BSZSL), which is used to enhance zero-shot Skeleton Action Recognition. Firstly, a multi-prompt learning strategy is introduced. It utilizes prompt information to guide the model to simultaneously learn complementary semantic information related to behavior categories from both skeleton sequences and RGB information, making the visual feature information more expressive. Specifically, it employs a pre-trained multimodal model to extract prior knowledge related to behaviors from RGB and then guides the skeleton sequence features using this knowledge. This enhances the complementary features of both RGB and skeleton modalities. Secondly, to constrain the mapping relationship of different modal feature information, a Contrastive Clustering (CC) module is designed. This module emphasizes the similarity of features within the same category while increasing the differences in feature mapping between different categories. Finally, evaluating our method on the NTU-60 and NTU-120 datasets with multi-split settings, the result demonstrates that our method achieves state-of-the-art performance in both zero-shot learning (ZSL) and generalized zero-shot learning (GZSL) settings.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126814"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425004361","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Zero-shot action recognition (ZSAR) recognizes action categories that have not appeared during the training process and has garnered widespread attention due to its potential to save costs in retraining and data annotation. We observed that the existing ZSAR method based on skeleton sequences only uses human posture information in the skeleton sequence, lacks discriminative semantic representation in some similar behavior recognition, and lacks effective interaction between different modalities, resulting in unsatisfactory performance and limited applications of the ZSAR. To solve these problems, we propose a novel method, called Beyond-Skeleton zero-shot Learning (BSZSL), which is used to enhance zero-shot Skeleton Action Recognition. Firstly, a multi-prompt learning strategy is introduced. It utilizes prompt information to guide the model to simultaneously learn complementary semantic information related to behavior categories from both skeleton sequences and RGB information, making the visual feature information more expressive. Specifically, it employs a pre-trained multimodal model to extract prior knowledge related to behaviors from RGB and then guides the skeleton sequence features using this knowledge. This enhances the complementary features of both RGB and skeleton modalities. Secondly, to constrain the mapping relationship of different modal feature information, a Contrastive Clustering (CC) module is designed. This module emphasizes the similarity of features within the same category while increasing the differences in feature mapping between different categories. Finally, evaluating our method on the NTU-60 and NTU-120 datasets with multi-split settings, the result demonstrates that our method achieves state-of-the-art performance in both zero-shot learning (ZSL) and generalized zero-shot learning (GZSL) settings.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

超越骨架：通过补充 RGB 视觉信息增强零镜头骨架动作识别能力

零射击动作识别（Zero-shot action recognition， ZSAR）能够识别训练过程中未出现的动作类别，由于其具有节省再训练和数据标注成本的潜力而受到广泛关注。研究发现，现有的基于骨架序列的ZSAR方法仅利用骨架序列中的人体姿态信息，在某些相似行为识别中缺乏判别性的语义表示，并且不同模态之间缺乏有效的交互，导致ZSAR的性能不理想，限制了其应用。为了解决这些问题，我们提出了一种新的方法，称为超越骨架零射击学习（BSZSL），用于增强零射击骨骼动作识别。首先，介绍了多提示学习策略。它利用提示信息引导模型同时从骨架序列和RGB信息中学习与行为类别相关的互补语义信息，使视觉特征信息更具表现力。具体而言，它采用预训练的多模态模型从RGB中提取与行为相关的先验知识，然后利用这些知识指导骨架序列特征。这增强了RGB和骨骼模式的互补特性。其次，为了约束不同模态特征信息的映射关系，设计了对比聚类（CC）模块；该模块强调同一类别内特征的相似性，同时增加了不同类别之间特征映射的差异性。最后，在NTU-60和NTU-120数据集上对我们的方法进行了多分割设置的评估，结果表明我们的方法在零射击学习（ZSL）和广义零射击学习（GZSL）设置中都达到了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.