{"title":"基于时空异构框架的通道空间注意行为识别","authors":"Yiying Li, Yulin Li, Yanfei Gu","doi":"10.1145/3404555.3404592","DOIUrl":null,"url":null,"abstract":"Recent years have witnessed the effective of attention network based on two-stream for video action recognition. However, most methods adopt the same structure on spatial stream and temporal stream, which produce amount redundant information and often ignore the relevance among channels. In this paper, we propose a channel-wise spatial attention with spatiotemporal heterogeneous framework, a new approach to action recognition. First, we employ two different network structures for spatial stream and temporal stream to improve the performance of action recognition. Then, we design a channel-wise network and spatial network inspired by self-attention mechanism to obtain the fine-grained and salient information of the video. Finally, the feature of video for action recognition is generated by end-to-end training. Experimental results on the datasets HMDB51 and UCF101 shows our method can effectively recognize the actions in the video.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Channel-Wise Spatial Attention with Spatiotemporal Heterogeneous Framework for Action Recognition\",\"authors\":\"Yiying Li, Yulin Li, Yanfei Gu\",\"doi\":\"10.1145/3404555.3404592\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent years have witnessed the effective of attention network based on two-stream for video action recognition. However, most methods adopt the same structure on spatial stream and temporal stream, which produce amount redundant information and often ignore the relevance among channels. In this paper, we propose a channel-wise spatial attention with spatiotemporal heterogeneous framework, a new approach to action recognition. First, we employ two different network structures for spatial stream and temporal stream to improve the performance of action recognition. Then, we design a channel-wise network and spatial network inspired by self-attention mechanism to obtain the fine-grained and salient information of the video. Finally, the feature of video for action recognition is generated by end-to-end training. Experimental results on the datasets HMDB51 and UCF101 shows our method can effectively recognize the actions in the video.\",\"PeriodicalId\":220526,\"journal\":{\"name\":\"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3404555.3404592\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3404555.3404592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Channel-Wise Spatial Attention with Spatiotemporal Heterogeneous Framework for Action Recognition
Recent years have witnessed the effective of attention network based on two-stream for video action recognition. However, most methods adopt the same structure on spatial stream and temporal stream, which produce amount redundant information and often ignore the relevance among channels. In this paper, we propose a channel-wise spatial attention with spatiotemporal heterogeneous framework, a new approach to action recognition. First, we employ two different network structures for spatial stream and temporal stream to improve the performance of action recognition. Then, we design a channel-wise network and spatial network inspired by self-attention mechanism to obtain the fine-grained and salient information of the video. Finally, the feature of video for action recognition is generated by end-to-end training. Experimental results on the datasets HMDB51 and UCF101 shows our method can effectively recognize the actions in the video.