{"title":"DTR: An Information Bottleneck Based Regularization Framework for Video Action Recognition","authors":"Jiawei Fan, Yu Zhao, Xie Yu, Lihua Ma, Junqi Liu, Fangqiu Yi, Boxun Li","doi":"10.1145/3503161.3548326","DOIUrl":null,"url":null,"abstract":"An optimal representation should contain the maximum task-relevant information and minimum task-irrelevant information, as revealed from Information Bottleneck Principle. In video action recognition, CNN based approaches have obtained better spatio-temporal representation by modeling temporal context. However, these approaches still suffer low generalization. In this paper, we propose a moderate optimization based approach called Dual-view Temporal Regularization (DTR) based on Information Bottleneck Principle for an effective and generalized video representation without sacrificing any efficiency of the model. On the one hand, we design Dual-view Regularization (DR) to constrain task-irrelevant information, which can effectively compress background and irrelevant motion information. On the other hand, we design Temporal Regularization (TR) to maintain task-relevant information by finding an optimal difference between frames, which benefits extracting sufficient motion information. The experimental results demonstrate: (1) DTR is orthogonal to temporal modeling as well as data augmentation, and it achieves general improvement on both model-based and data-based approaches; (2) DTR is effective among 7 different datasets, especially on motion-centric datasets i.e. SSv1/ SSv2, in which DTR gets 6%/3.8% absolute gains in top-1 accuracy.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503161.3548326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
An optimal representation should contain the maximum task-relevant information and minimum task-irrelevant information, as revealed from Information Bottleneck Principle. In video action recognition, CNN based approaches have obtained better spatio-temporal representation by modeling temporal context. However, these approaches still suffer low generalization. In this paper, we propose a moderate optimization based approach called Dual-view Temporal Regularization (DTR) based on Information Bottleneck Principle for an effective and generalized video representation without sacrificing any efficiency of the model. On the one hand, we design Dual-view Regularization (DR) to constrain task-irrelevant information, which can effectively compress background and irrelevant motion information. On the other hand, we design Temporal Regularization (TR) to maintain task-relevant information by finding an optimal difference between frames, which benefits extracting sufficient motion information. The experimental results demonstrate: (1) DTR is orthogonal to temporal modeling as well as data augmentation, and it achieves general improvement on both model-based and data-based approaches; (2) DTR is effective among 7 different datasets, especially on motion-centric datasets i.e. SSv1/ SSv2, in which DTR gets 6%/3.8% absolute gains in top-1 accuracy.