Beyond-Skeleton: Zero-shot Skeleton Action Recognition enhanced by supplementary RGB visual information

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2025-05-10 Epub Date: 2025-02-15 DOI:10.1016/j.eswa.2025.126814
Hongjie Liu , Yingchun Niu , Kun Zeng , Chun Liu , Mengjie Hu , Qing Song
{"title":"Beyond-Skeleton: Zero-shot Skeleton Action Recognition enhanced by supplementary RGB visual information","authors":"Hongjie Liu ,&nbsp;Yingchun Niu ,&nbsp;Kun Zeng ,&nbsp;Chun Liu ,&nbsp;Mengjie Hu ,&nbsp;Qing Song","doi":"10.1016/j.eswa.2025.126814","DOIUrl":null,"url":null,"abstract":"<div><div>Zero-shot action recognition (ZSAR) recognizes action categories that have not appeared during the training process and has garnered widespread attention due to its potential to save costs in retraining and data annotation. We observed that the existing ZSAR method based on skeleton sequences only uses human posture information in the skeleton sequence, lacks discriminative semantic representation in some similar behavior recognition, and lacks effective interaction between different modalities, resulting in unsatisfactory performance and limited applications of the ZSAR. To solve these problems, we propose a novel method, called Beyond-Skeleton zero-shot Learning (BSZSL), which is used to enhance zero-shot Skeleton Action Recognition. Firstly, a multi-prompt learning strategy is introduced. It utilizes prompt information to guide the model to simultaneously learn complementary semantic information related to behavior categories from both skeleton sequences and RGB information, making the visual feature information more expressive. Specifically, it employs a pre-trained multimodal model to extract prior knowledge related to behaviors from RGB and then guides the skeleton sequence features using this knowledge. This enhances the complementary features of both RGB and skeleton modalities. Secondly, to constrain the mapping relationship of different modal feature information, a Contrastive Clustering (CC) module is designed. This module emphasizes the similarity of features within the same category while increasing the differences in feature mapping between different categories. Finally, evaluating our method on the NTU-60 and NTU-120 datasets with multi-split settings, the result demonstrates that our method achieves state-of-the-art performance in both zero-shot learning (ZSL) and generalized zero-shot learning (GZSL) settings.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126814"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425004361","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Zero-shot action recognition (ZSAR) recognizes action categories that have not appeared during the training process and has garnered widespread attention due to its potential to save costs in retraining and data annotation. We observed that the existing ZSAR method based on skeleton sequences only uses human posture information in the skeleton sequence, lacks discriminative semantic representation in some similar behavior recognition, and lacks effective interaction between different modalities, resulting in unsatisfactory performance and limited applications of the ZSAR. To solve these problems, we propose a novel method, called Beyond-Skeleton zero-shot Learning (BSZSL), which is used to enhance zero-shot Skeleton Action Recognition. Firstly, a multi-prompt learning strategy is introduced. It utilizes prompt information to guide the model to simultaneously learn complementary semantic information related to behavior categories from both skeleton sequences and RGB information, making the visual feature information more expressive. Specifically, it employs a pre-trained multimodal model to extract prior knowledge related to behaviors from RGB and then guides the skeleton sequence features using this knowledge. This enhances the complementary features of both RGB and skeleton modalities. Secondly, to constrain the mapping relationship of different modal feature information, a Contrastive Clustering (CC) module is designed. This module emphasizes the similarity of features within the same category while increasing the differences in feature mapping between different categories. Finally, evaluating our method on the NTU-60 and NTU-120 datasets with multi-split settings, the result demonstrates that our method achieves state-of-the-art performance in both zero-shot learning (ZSL) and generalized zero-shot learning (GZSL) settings.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
超越骨架:通过补充 RGB 视觉信息增强零镜头骨架动作识别能力
零射击动作识别(Zero-shot action recognition, ZSAR)能够识别训练过程中未出现的动作类别,由于其具有节省再训练和数据标注成本的潜力而受到广泛关注。研究发现,现有的基于骨架序列的ZSAR方法仅利用骨架序列中的人体姿态信息,在某些相似行为识别中缺乏判别性的语义表示,并且不同模态之间缺乏有效的交互,导致ZSAR的性能不理想,限制了其应用。为了解决这些问题,我们提出了一种新的方法,称为超越骨架零射击学习(BSZSL),用于增强零射击骨骼动作识别。首先,介绍了多提示学习策略。它利用提示信息引导模型同时从骨架序列和RGB信息中学习与行为类别相关的互补语义信息,使视觉特征信息更具表现力。具体而言,它采用预训练的多模态模型从RGB中提取与行为相关的先验知识,然后利用这些知识指导骨架序列特征。这增强了RGB和骨骼模式的互补特性。其次,为了约束不同模态特征信息的映射关系,设计了对比聚类(CC)模块;该模块强调同一类别内特征的相似性,同时增加了不同类别之间特征映射的差异性。最后,在NTU-60和NTU-120数据集上对我们的方法进行了多分割设置的评估,结果表明我们的方法在零射击学习(ZSL)和广义零射击学习(GZSL)设置中都达到了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
期刊最新文献
H-SemiS: Hierarchical fusion of semi and self-supervised learning for knee osteoarthritis severity grading Expert systems for predicting the efficiencies of photomultiplication organic photodetectors PASegNet: Integrating dual awareness of position and boundary on 3D dental meshes for tooth instance segmentation Genetic programming with advanced diverse partner selection for dynamic scheduling Real-time analysis of indoor sports game situations through deep learning-based classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1