{"title":"Channel-Wise Spatial Attention with Spatiotemporal Heterogeneous Framework for Action Recognition","authors":"Yiying Li, Yulin Li, Yanfei Gu","doi":"10.1145/3404555.3404592","DOIUrl":null,"url":null,"abstract":"Recent years have witnessed the effective of attention network based on two-stream for video action recognition. However, most methods adopt the same structure on spatial stream and temporal stream, which produce amount redundant information and often ignore the relevance among channels. In this paper, we propose a channel-wise spatial attention with spatiotemporal heterogeneous framework, a new approach to action recognition. First, we employ two different network structures for spatial stream and temporal stream to improve the performance of action recognition. Then, we design a channel-wise network and spatial network inspired by self-attention mechanism to obtain the fine-grained and salient information of the video. Finally, the feature of video for action recognition is generated by end-to-end training. Experimental results on the datasets HMDB51 and UCF101 shows our method can effectively recognize the actions in the video.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3404555.3404592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recent years have witnessed the effective of attention network based on two-stream for video action recognition. However, most methods adopt the same structure on spatial stream and temporal stream, which produce amount redundant information and often ignore the relevance among channels. In this paper, we propose a channel-wise spatial attention with spatiotemporal heterogeneous framework, a new approach to action recognition. First, we employ two different network structures for spatial stream and temporal stream to improve the performance of action recognition. Then, we design a channel-wise network and spatial network inspired by self-attention mechanism to obtain the fine-grained and salient information of the video. Finally, the feature of video for action recognition is generated by end-to-end training. Experimental results on the datasets HMDB51 and UCF101 shows our method can effectively recognize the actions in the video.