Source-free Temporal Attentive Domain Adaptation for Video Action Recognition

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI:10.1145/3512527.3531392

Peipeng Chen, A. J. Ma

{"title":"Source-free Temporal Attentive Domain Adaptation for Video Action Recognition","authors":"Peipeng Chen, A. J. Ma","doi":"10.1145/3512527.3531392","DOIUrl":null,"url":null,"abstract":"With the rapidly increasing video data, many video analysis techniques have been developed and achieved success in recent years. To mitigate the distribution bias of video data across domains, unsupervised video domain adaptation (UVDA) has been proposed and become an active research topic. Nevertheless, existing UVDA methods need to access source domain data during training, which may result in problems of privacy policy violation and transfer inefficiency. To address this issue, we propose a novel source-free temporal attentive domain adaptation (SFTADA) method for video action recognition under the more challenging UVDA setting, such that source domain data is not required for learning the target domain. In our method, an innovative Temporal Attentive aGgregation (TAG) module is designed to combine frame-level features with varying importance weights for video-level representation generation. Without source domain data and label information in the target domain and during testing, an MLP-based attention network is trained to approximate the attentive aggregation function based on class centroids. By minimizing frame-level and video-level loss functions, both the temporal and spatial domain shifts in cross-domain video data can be reduced. Extensive experiments on four benchmark datasets demonstrate the effectiveness of our proposed method in solving the challenging source-free UVDA task.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3512527.3531392","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

With the rapidly increasing video data, many video analysis techniques have been developed and achieved success in recent years. To mitigate the distribution bias of video data across domains, unsupervised video domain adaptation (UVDA) has been proposed and become an active research topic. Nevertheless, existing UVDA methods need to access source domain data during training, which may result in problems of privacy policy violation and transfer inefficiency. To address this issue, we propose a novel source-free temporal attentive domain adaptation (SFTADA) method for video action recognition under the more challenging UVDA setting, such that source domain data is not required for learning the target domain. In our method, an innovative Temporal Attentive aGgregation (TAG) module is designed to combine frame-level features with varying importance weights for video-level representation generation. Without source domain data and label information in the target domain and during testing, an MLP-based attention network is trained to approximate the attentive aggregation function based on class centroids. By minimizing frame-level and video-level loss functions, both the temporal and spatial domain shifts in cross-domain video data can be reduced. Extensive experiments on four benchmark datasets demonstrate the effectiveness of our proposed method in solving the challenging source-free UVDA task.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

视频动作识别的无源时间关注域自适应

随着视频数据的快速增长，近年来出现了许多视频分析技术，并取得了成功。为了减轻视频数据的跨域分布偏差，无监督视频域自适应(UVDA)被提出并成为一个活跃的研究课题。然而，现有的UVDA方法在训练过程中需要访问源域数据，这可能会导致违反隐私政策和传输效率低下的问题。为了解决这个问题，我们提出了一种新的无源时间关注域自适应(SFTADA)方法，用于更具挑战性的UVDA设置下的视频动作识别，这样就不需要源域数据来学习目标域。在我们的方法中，设计了一个创新的时间关注聚合(TAG)模块，将具有不同重要权重的帧级特征结合起来，用于视频级表示生成。在测试过程中，在没有源域数据和目标域标签信息的情况下，训练基于mlp的注意力网络来逼近基于类质心的注意力聚合函数。通过最小化帧级和视频级损失函数，可以减少跨域视频数据的时域和空域偏移。在四个基准数据集上的大量实验证明了我们提出的方法在解决具有挑战性的无源UVDA任务方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2022 International Conference on Multimedia Retrieval

自引率

0.00%

发文量