Momentum Contrastive Teacher for Semi-Supervised Skeleton Action Recognition

IF 13.7 IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-01-01 DOI:10.1109/TIP.2024.3522818

Mingqi Lu;Xiaobo Lu;Jun Liu

{"title":"Momentum Contrastive Teacher for Semi-Supervised Skeleton Action Recognition","authors":"Mingqi Lu;Xiaobo Lu;Jun Liu","doi":"10.1109/TIP.2024.3522818","DOIUrl":null,"url":null,"abstract":"In the field of semi-supervised skeleton action recognition, existing work primarily follows the paradigm of self-supervised training followed by supervised fine-tuning. However, self-supervised learning focuses on exploring data representation rather than label classification. Inspired by Mean Teacher, we explore a novel pseudo-label-based model called SkeleMoCLR. Specifically, we use MoCo v2 as the foundation and extend it into a teacher-student network through a momentum encoder. The generation of high-confidence pseudo-labels requires a well-pretrained model as a prerequisite. In cases where large-scale skeleton data is lacking, we propose leveraging contrastive learning to transfer discriminative action features from large vision-text models to the skeleton encoder. Following the contrastive pre-training, the key encoder branch from MoCo v2 serves as the teacher to generate pseudo-labels for training the query encoder branch. Furthermore, we introduce pseudo-labels into the memory queues, sampling negative samples from different pseudo-label classes to maximize the representation differentiation between different categories. We jointly optimize the classification loss for both labeled and pseudo-labeled data and the contrastive loss for unlabeled data to update model parameters, fully harnessing the potential of pseudo-label semi-supervised learning and self-supervised learning. Extensive experiments conducted on the NTU-60, NTU-120, PKU-MMD, and NW-UCLA datasets demonstrate that our SkeleMoCLR outperforms existing competitive methods in the semi-supervised skeleton action recognition task.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"295-305"},"PeriodicalIF":13.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10820022/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the field of semi-supervised skeleton action recognition, existing work primarily follows the paradigm of self-supervised training followed by supervised fine-tuning. However, self-supervised learning focuses on exploring data representation rather than label classification. Inspired by Mean Teacher, we explore a novel pseudo-label-based model called SkeleMoCLR. Specifically, we use MoCo v2 as the foundation and extend it into a teacher-student network through a momentum encoder. The generation of high-confidence pseudo-labels requires a well-pretrained model as a prerequisite. In cases where large-scale skeleton data is lacking, we propose leveraging contrastive learning to transfer discriminative action features from large vision-text models to the skeleton encoder. Following the contrastive pre-training, the key encoder branch from MoCo v2 serves as the teacher to generate pseudo-labels for training the query encoder branch. Furthermore, we introduce pseudo-labels into the memory queues, sampling negative samples from different pseudo-label classes to maximize the representation differentiation between different categories. We jointly optimize the classification loss for both labeled and pseudo-labeled data and the contrastive loss for unlabeled data to update model parameters, fully harnessing the potential of pseudo-label semi-supervised learning and self-supervised learning. Extensive experiments conducted on the NTU-60, NTU-120, PKU-MMD, and NW-UCLA datasets demonstrate that our SkeleMoCLR outperforms existing competitive methods in the semi-supervised skeleton action recognition task.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

半监督骨骼动作识别的动量对比教师

在半监督骨骼动作识别领域，现有的工作主要遵循自监督训练和监督微调的范式。然而，自监督学习侧重于探索数据表示，而不是标签分类。受Mean Teacher的启发，我们探索了一种新的基于伪标签的模型，称为SkeleMoCLR。具体来说，我们使用MoCo v2作为基础，并通过动量编码器将其扩展到师生网络。生成高置信度伪标签需要一个经过良好预训练的模型作为先决条件。在缺乏大规模骨架数据的情况下，我们建议利用对比学习将判别动作特征从大型视觉文本模型转移到骨架编码器。经过对比预训练后，MoCo v2的关键编码器分支作为老师生成伪标签，用于训练查询编码器分支。此外，我们在内存队列中引入伪标签，从不同的伪标签类中抽取负样本，以最大限度地提高不同类别之间的表示差异。我们共同优化了标记和伪标记数据的分类损失以及未标记数据的对比损失来更新模型参数，充分利用了伪标签半监督学习和自监督学习的潜力。在NTU-60、NTU-120、PKU-MMD和NW-UCLA数据集上进行的大量实验表明，我们的SkeleMoCLR在半监督骨骼动作识别任务中优于现有的竞争方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量