{"title":"基于多图卷积网络融合的动作识别","authors":"Camille Maurice, F. Lerasle","doi":"10.1109/AVSS52988.2021.9663765","DOIUrl":null,"url":null,"abstract":"We propose two light-weight and specialized Spatio-Temporal Graph Convolutional Networks (ST-GCNs): one for actions characterized by the motion of the human body and a novel one we especially design to recognize particular objects configurations during human actions execution. We propose a late-fusion strategy of the predictions of both graphs networks to get the most out of the two and to clear out ambiguities in the action classification. This modular approach enables us to reduce memory cost and training times. Moreover we also propose the same late fusion mechanism to further improve the performance using a Bayesian approach. We show results on 2 public datasets: CAD-120 and Watch-n-Patch. Our late-fusion mechanism yields performance gains in accuracy of respectively + 21 percentage points (pp), + 7 pp on Watch-n-Patch and CAD-120 compared to the individual graphs. Our approach outperforms most of the significant existing approaches.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Action Recognition with Fusion of Multiple Graph Convolutional Networks\",\"authors\":\"Camille Maurice, F. Lerasle\",\"doi\":\"10.1109/AVSS52988.2021.9663765\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose two light-weight and specialized Spatio-Temporal Graph Convolutional Networks (ST-GCNs): one for actions characterized by the motion of the human body and a novel one we especially design to recognize particular objects configurations during human actions execution. We propose a late-fusion strategy of the predictions of both graphs networks to get the most out of the two and to clear out ambiguities in the action classification. This modular approach enables us to reduce memory cost and training times. Moreover we also propose the same late fusion mechanism to further improve the performance using a Bayesian approach. We show results on 2 public datasets: CAD-120 and Watch-n-Patch. Our late-fusion mechanism yields performance gains in accuracy of respectively + 21 percentage points (pp), + 7 pp on Watch-n-Patch and CAD-120 compared to the individual graphs. Our approach outperforms most of the significant existing approaches.\",\"PeriodicalId\":246327,\"journal\":{\"name\":\"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AVSS52988.2021.9663765\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS52988.2021.9663765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Action Recognition with Fusion of Multiple Graph Convolutional Networks
We propose two light-weight and specialized Spatio-Temporal Graph Convolutional Networks (ST-GCNs): one for actions characterized by the motion of the human body and a novel one we especially design to recognize particular objects configurations during human actions execution. We propose a late-fusion strategy of the predictions of both graphs networks to get the most out of the two and to clear out ambiguities in the action classification. This modular approach enables us to reduce memory cost and training times. Moreover we also propose the same late fusion mechanism to further improve the performance using a Bayesian approach. We show results on 2 public datasets: CAD-120 and Watch-n-Patch. Our late-fusion mechanism yields performance gains in accuracy of respectively + 21 percentage points (pp), + 7 pp on Watch-n-Patch and CAD-120 compared to the individual graphs. Our approach outperforms most of the significant existing approaches.