{"title":"Frequency Decoupled Masked Auto-Encoder for Self-Supervised Skeleton-Based Action Recognition","authors":"Ye Liu;Tianhao Shi;Mingliang Zhai;Jun Liu","doi":"10.1109/LSP.2024.3525398","DOIUrl":null,"url":null,"abstract":"In 3D skeleton-based action recognition, the limited availability of supervised data has driven interest in self-supervised learning methods. The reconstruction paradigm using masked auto-encoder (MAE) is an effective and mainstream self-supervised learning approach. However, recent studies indicate that MAE models tend to focus on features within a certain frequency range, which may result in the loss of important information. To address this issue, we propose a frequency decoupled MAE. Specifically, by incorporating a scale-specific frequency feature reconstruction module, we delve into leveraging frequency information as a direct and explicit target for reconstruction, which augments the MAE's capability to discern and accurately reproduce diverse frequency attributes within the data. Moreover, in order to address the issue of unstable gradient updates caused by more complex optimization objectives with frequency reconstruction, we introduce a dual-path network combined with an exponential moving average (EMA) parameter updating strategy to guide the model in stabilizing the training process. We have conducted extensive experiments which have demonstrated the effectiveness of the proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"546-550"},"PeriodicalIF":3.2000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10820965/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
In 3D skeleton-based action recognition, the limited availability of supervised data has driven interest in self-supervised learning methods. The reconstruction paradigm using masked auto-encoder (MAE) is an effective and mainstream self-supervised learning approach. However, recent studies indicate that MAE models tend to focus on features within a certain frequency range, which may result in the loss of important information. To address this issue, we propose a frequency decoupled MAE. Specifically, by incorporating a scale-specific frequency feature reconstruction module, we delve into leveraging frequency information as a direct and explicit target for reconstruction, which augments the MAE's capability to discern and accurately reproduce diverse frequency attributes within the data. Moreover, in order to address the issue of unstable gradient updates caused by more complex optimization objectives with frequency reconstruction, we introduce a dual-path network combined with an exponential moving average (EMA) parameter updating strategy to guide the model in stabilizing the training process. We have conducted extensive experiments which have demonstrated the effectiveness of the proposed method.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.