{"title":"Temporal Action Detection Based on Hierarchical Object Detection Networks","authors":"Yi-Hui Wu, Wen-Jiin Tsai, Hua-Tsung Chen","doi":"10.1109/Ubi-Media.2019.00031","DOIUrl":null,"url":null,"abstract":"This paper addresses the problem of temporal action detection from untrimmed videos. Considering that actions can be recognized by the occurrence of objects and the corresponding moving information in the video, a hierarchical model is proposed which consists of two object detection networks to do temporal action detection. The first network is used to detect objects in each frame, and the second one is for temporal action detection. We also proposed a method which converts the object detection results of the first network into a new type of frame so that it can be fed to the second network. The generated frame has six channels with spatiotemporal information beneficial to action detection. Quantitative results on THUMOS14 dataset demonstrate the superior of the proposed model with satisfactory performance gains over state-of-the-art action detection methods.","PeriodicalId":259542,"journal":{"name":"2019 Twelfth International Conference on Ubi-Media Computing (Ubi-Media)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Twelfth International Conference on Ubi-Media Computing (Ubi-Media)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Ubi-Media.2019.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper addresses the problem of temporal action detection from untrimmed videos. Considering that actions can be recognized by the occurrence of objects and the corresponding moving information in the video, a hierarchical model is proposed which consists of two object detection networks to do temporal action detection. The first network is used to detect objects in each frame, and the second one is for temporal action detection. We also proposed a method which converts the object detection results of the first network into a new type of frame so that it can be fed to the second network. The generated frame has six channels with spatiotemporal information beneficial to action detection. Quantitative results on THUMOS14 dataset demonstrate the superior of the proposed model with satisfactory performance gains over state-of-the-art action detection methods.