{"title":"Making Video Recognition Models Robust to Common Corruptions With Supervised Contrastive Learning","authors":"Tomu Hirata, Yusuke Mukuta, Tatsuya Harada","doi":"10.1145/3469877.3497692","DOIUrl":null,"url":null,"abstract":"The video understanding capability of video recognition models has been significantly improved by the development of deep learning techniques and various video datasets available. However, video recognition models are still vulnerable to invisible perturbations, which limits the use of deep video recognition models in the real world. We present a new benchmark for the robustness of action recognition classifiers to general corruptions, and show that a supervised contrastive learning framework is effective in obtaining discriminative and stable video representations, and makes deep video recognition models robust to general input corruptions. Experiments on the action recognition task for corrupted videos show the high robustness of the proposed method on the UCF101 and HMDB51 datasets with various common corruptions.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469877.3497692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The video understanding capability of video recognition models has been significantly improved by the development of deep learning techniques and various video datasets available. However, video recognition models are still vulnerable to invisible perturbations, which limits the use of deep video recognition models in the real world. We present a new benchmark for the robustness of action recognition classifiers to general corruptions, and show that a supervised contrastive learning framework is effective in obtaining discriminative and stable video representations, and makes deep video recognition models robust to general input corruptions. Experiments on the action recognition task for corrupted videos show the high robustness of the proposed method on the UCF101 and HMDB51 datasets with various common corruptions.