{"title":"Attention-guided mask learning for self-supervised 3D action recognition","authors":"Haoyuan Zhang","doi":"10.1007/s40747-024-01558-1","DOIUrl":null,"url":null,"abstract":"<p>Most existing 3D action recognition works rely on the supervised learning paradigm, yet the limited availability of annotated data limits the full potential of encoding networks. As a result, effective self-supervised pre-training strategies have been actively researched. In this paper, we target to explore a self-supervised learning approach for 3D action recognition, and propose the Attention-guided Mask Learning (AML) scheme. Specifically, the dropping mechanism is introduced into contrastive learning to develop Attention-guided Mask (AM) module as well as mask learning strategy, respectively. The AM module leverages the spatial and temporal attention to guide the corresponding features masking, so as to produce the masked contrastive object. The mask learning strategy enables the model to discriminate different actions even with important features masked, which makes action representation learning more discriminative. What’s more, to alleviate the strict positive constraint that would hinder representation learning, the positive-enhanced learning strategy is leveraged in the second-stage training. Extensive experiments on NTU-60, NTU-120, and PKU-MMD datasets show that the proposed AML scheme improves the performance in self-supervised 3D action recognition, achieving state-of-the-art results.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01558-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Most existing 3D action recognition works rely on the supervised learning paradigm, yet the limited availability of annotated data limits the full potential of encoding networks. As a result, effective self-supervised pre-training strategies have been actively researched. In this paper, we target to explore a self-supervised learning approach for 3D action recognition, and propose the Attention-guided Mask Learning (AML) scheme. Specifically, the dropping mechanism is introduced into contrastive learning to develop Attention-guided Mask (AM) module as well as mask learning strategy, respectively. The AM module leverages the spatial and temporal attention to guide the corresponding features masking, so as to produce the masked contrastive object. The mask learning strategy enables the model to discriminate different actions even with important features masked, which makes action representation learning more discriminative. What’s more, to alleviate the strict positive constraint that would hinder representation learning, the positive-enhanced learning strategy is leveraged in the second-stage training. Extensive experiments on NTU-60, NTU-120, and PKU-MMD datasets show that the proposed AML scheme improves the performance in self-supervised 3D action recognition, achieving state-of-the-art results.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.