Collaboratively Self-Supervised Video Representation Learning for Action Recognition

IF 8 1区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS IEEE Transactions on Information Forensics and Security Pub Date : 2025-01-20 DOI:10.1109/TIFS.2025.3531772
Jie Zhang;Zhifan Wan;Lanqing Hu;Stephen Lin;Shuzhe Wu;Shiguang Shan
{"title":"Collaboratively Self-Supervised Video Representation Learning for Action Recognition","authors":"Jie Zhang;Zhifan Wan;Lanqing Hu;Stephen Lin;Shuzhe Wu;Shiguang Shan","doi":"10.1109/TIFS.2025.3531772","DOIUrl":null,"url":null,"abstract":"Considering the close connection between action recognition and human pose estimation, we design a Collaboratively Self-supervised Video Representation (CSVR) learning framework specific to action recognition by jointly factoring in generative pose prediction and discriminative context matching as pretext tasks. Specifically, our CSVR consists of three branches: a generative pose prediction branch, a discriminative context matching branch, and a video generating branch. Among them, the first one encodes dynamic motion feature by utilizing Conditional-GAN to predict the human poses of future frames, and the second branch extracts static context features by contrasting positive and negative video feature and I-frame feature pairs. The third branch is designed to generate both current and future video frames, for the purpose of collaboratively improving dynamic motion features and static context features. Extensive experiments demonstrate that our method achieves state-of-the-art performance on multiple popular video datasets.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"1895-1907"},"PeriodicalIF":8.0000,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10847948/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Considering the close connection between action recognition and human pose estimation, we design a Collaboratively Self-supervised Video Representation (CSVR) learning framework specific to action recognition by jointly factoring in generative pose prediction and discriminative context matching as pretext tasks. Specifically, our CSVR consists of three branches: a generative pose prediction branch, a discriminative context matching branch, and a video generating branch. Among them, the first one encodes dynamic motion feature by utilizing Conditional-GAN to predict the human poses of future frames, and the second branch extracts static context features by contrasting positive and negative video feature and I-frame feature pairs. The third branch is designed to generate both current and future video frames, for the purpose of collaboratively improving dynamic motion features and static context features. Extensive experiments demonstrate that our method achieves state-of-the-art performance on multiple popular video datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于动作识别的协同自监督视频表示学习
考虑到动作识别与人体姿态估计之间的密切联系,我们设计了一个针对动作识别的协同自监督视频表示(CSVR)学习框架,将生成式姿态预测和判别式上下文匹配作为借口任务。具体来说,我们的CSVR由三个分支组成:生成姿态预测分支,判别上下文匹配分支和视频生成分支。其中,第一个分支利用条件gan对动态运动特征进行编码,预测未来帧的人体姿态;第二个分支通过对比正负视频特征和i帧特征对提取静态上下文特征。第三个分支旨在生成当前和未来的视频帧,以协同改进动态运动特征和静态上下文特征。大量的实验表明,我们的方法在多个流行的视频数据集上达到了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Information Forensics and Security
IEEE Transactions on Information Forensics and Security 工程技术-工程:电子与电气
CiteScore
14.40
自引率
7.40%
发文量
234
审稿时长
6.5 months
期刊介绍: The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features
期刊最新文献
SeeGait: Synergistic Co-evolving Representations for Multimodal Gait Recognition via Hierarchical Multi-Stage Fusion One Trigger, Multiple Victims: Clean-Label Neighborhood Backdoor Attacks on Graph Neural Networks Component-Specific Prompt Tuning for Deepfake Detection GDetox : Purifying Backdoor Encoder in Graph Self-supervised Learning via Knowledge Distillation IFAD: Privacy-Preserving Isolation Forest Based Anomaly Detection in Public Cloud Environments
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1