{"title":"Source-free Temporal Attentive Domain Adaptation for Video Action Recognition","authors":"Peipeng Chen, A. J. Ma","doi":"10.1145/3512527.3531392","DOIUrl":null,"url":null,"abstract":"With the rapidly increasing video data, many video analysis techniques have been developed and achieved success in recent years. To mitigate the distribution bias of video data across domains, unsupervised video domain adaptation (UVDA) has been proposed and become an active research topic. Nevertheless, existing UVDA methods need to access source domain data during training, which may result in problems of privacy policy violation and transfer inefficiency. To address this issue, we propose a novel source-free temporal attentive domain adaptation (SFTADA) method for video action recognition under the more challenging UVDA setting, such that source domain data is not required for learning the target domain. In our method, an innovative Temporal Attentive aGgregation (TAG) module is designed to combine frame-level features with varying importance weights for video-level representation generation. Without source domain data and label information in the target domain and during testing, an MLP-based attention network is trained to approximate the attentive aggregation function based on class centroids. By minimizing frame-level and video-level loss functions, both the temporal and spatial domain shifts in cross-domain video data can be reduced. Extensive experiments on four benchmark datasets demonstrate the effectiveness of our proposed method in solving the challenging source-free UVDA task.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3512527.3531392","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
With the rapidly increasing video data, many video analysis techniques have been developed and achieved success in recent years. To mitigate the distribution bias of video data across domains, unsupervised video domain adaptation (UVDA) has been proposed and become an active research topic. Nevertheless, existing UVDA methods need to access source domain data during training, which may result in problems of privacy policy violation and transfer inefficiency. To address this issue, we propose a novel source-free temporal attentive domain adaptation (SFTADA) method for video action recognition under the more challenging UVDA setting, such that source domain data is not required for learning the target domain. In our method, an innovative Temporal Attentive aGgregation (TAG) module is designed to combine frame-level features with varying importance weights for video-level representation generation. Without source domain data and label information in the target domain and during testing, an MLP-based attention network is trained to approximate the attentive aggregation function based on class centroids. By minimizing frame-level and video-level loss functions, both the temporal and spatial domain shifts in cross-domain video data can be reduced. Extensive experiments on four benchmark datasets demonstrate the effectiveness of our proposed method in solving the challenging source-free UVDA task.