{"title":"视频动作识别的时空协同卷积","authors":"Xu Li, Liqiang Wen, Jinjun Wang, Ming Zeng","doi":"10.1109/ICAICA50127.2020.9182498","DOIUrl":null,"url":null,"abstract":"Although video action recognition has achieved great progress in recent years, it is still a challenging task due to the huge computational complexity. Designing a lightweight network is a feasible solution, but it may reduce the spatio-temporal information modeling capability. In this paper, we propose a novel novel spatio-temporal collaborative convolution (denote as “STC-Conv”), which can efficiently encode spatio-temporal information. STC-Conv collaboratively learn spatial and temporal feature in one convolution filter kernel. In short, temporal convolution and spatial convolution are integrated in the one STC convolution kernel, which can effectively reduce the model complexity and improve the computational efficiency. STC-Conv is a universal convolution, which can be applied to the existing 2D CNNs, such as ResNet, DenseNet. The experimental results on the temporal-related dataset Something Something V1 prove the superiority of our method. Noticeably, STC-Conv enjoys more excellent performance than 3D CNNs at even lower computation cost than standard 2D CNNs.","PeriodicalId":113564,"journal":{"name":"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spatio-temporal Collaborative Convolution for Video Action Recognition\",\"authors\":\"Xu Li, Liqiang Wen, Jinjun Wang, Ming Zeng\",\"doi\":\"10.1109/ICAICA50127.2020.9182498\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although video action recognition has achieved great progress in recent years, it is still a challenging task due to the huge computational complexity. Designing a lightweight network is a feasible solution, but it may reduce the spatio-temporal information modeling capability. In this paper, we propose a novel novel spatio-temporal collaborative convolution (denote as “STC-Conv”), which can efficiently encode spatio-temporal information. STC-Conv collaboratively learn spatial and temporal feature in one convolution filter kernel. In short, temporal convolution and spatial convolution are integrated in the one STC convolution kernel, which can effectively reduce the model complexity and improve the computational efficiency. STC-Conv is a universal convolution, which can be applied to the existing 2D CNNs, such as ResNet, DenseNet. The experimental results on the temporal-related dataset Something Something V1 prove the superiority of our method. Noticeably, STC-Conv enjoys more excellent performance than 3D CNNs at even lower computation cost than standard 2D CNNs.\",\"PeriodicalId\":113564,\"journal\":{\"name\":\"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)\",\"volume\":\"94 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICA50127.2020.9182498\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICA50127.2020.9182498","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Spatio-temporal Collaborative Convolution for Video Action Recognition
Although video action recognition has achieved great progress in recent years, it is still a challenging task due to the huge computational complexity. Designing a lightweight network is a feasible solution, but it may reduce the spatio-temporal information modeling capability. In this paper, we propose a novel novel spatio-temporal collaborative convolution (denote as “STC-Conv”), which can efficiently encode spatio-temporal information. STC-Conv collaboratively learn spatial and temporal feature in one convolution filter kernel. In short, temporal convolution and spatial convolution are integrated in the one STC convolution kernel, which can effectively reduce the model complexity and improve the computational efficiency. STC-Conv is a universal convolution, which can be applied to the existing 2D CNNs, such as ResNet, DenseNet. The experimental results on the temporal-related dataset Something Something V1 prove the superiority of our method. Noticeably, STC-Conv enjoys more excellent performance than 3D CNNs at even lower computation cost than standard 2D CNNs.