{"title":"Heterogeneous Graph Convolutional Network for Visual Reinforcement Learning of Action Detection","authors":"Liangliang Wang, Chengxi Huang, Xinwei Chen","doi":"10.1109/CACRE58689.2023.10208414","DOIUrl":null,"url":null,"abstract":"Existing action detection approaches do not take spatio-temporal structural relationships of action clips into account, which leads to a low applicability in real-world scenarios and can benefit detecting if exploited. To this end, this paper proposes to formulate the action detection problem as a reinforcement learning process which is rewarded by observing both the clip sampling and classification results via adjusting the detection schemes. In particular, our framework consists of a heterogeneous graph convolutional network to represent the spatio-temporal features capturing the inherent relation, a policy network which determines the probabilities of a predefined action sampling spaces, and a classification network for action clip recognition. We accomplish the network joint learning by considering the temporal intersection over union and Euclidean distance between detected clips and ground-truth. Experiments on ActivityNet v1.3 and THUMOS14 demonstrate our method.","PeriodicalId":447007,"journal":{"name":"2023 8th International Conference on Automation, Control and Robotics Engineering (CACRE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 8th International Conference on Automation, Control and Robotics Engineering (CACRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CACRE58689.2023.10208414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Existing action detection approaches do not take spatio-temporal structural relationships of action clips into account, which leads to a low applicability in real-world scenarios and can benefit detecting if exploited. To this end, this paper proposes to formulate the action detection problem as a reinforcement learning process which is rewarded by observing both the clip sampling and classification results via adjusting the detection schemes. In particular, our framework consists of a heterogeneous graph convolutional network to represent the spatio-temporal features capturing the inherent relation, a policy network which determines the probabilities of a predefined action sampling spaces, and a classification network for action clip recognition. We accomplish the network joint learning by considering the temporal intersection over union and Euclidean distance between detected clips and ground-truth. Experiments on ActivityNet v1.3 and THUMOS14 demonstrate our method.