{"title":"Novel Activities Detection Algorithm in Extended Videos","authors":"L. Yao, Ying Qian","doi":"10.1109/WACVW.2019.00009","DOIUrl":null,"url":null,"abstract":"Due to participation in TRECVID ActEV[1] competition, we conduct research on temporal activity recognition. In this paper, we propose a system for activity detection and localize detected activities temporally in extended videos. Our system firstly detects objects in video frames. Secondly, we use position information of detected object, as input to the object tracking model, which can obtain motion information of multiple objects in consecutive frames. Lastly, we input consecutive video frames containing only detected objects into 3D Convolutional Neural Network to achieve features, and 3D CNN is followed by a recurrent neural network for accurately localizing the detected activity.","PeriodicalId":254512,"journal":{"name":"2019 IEEE Winter Applications of Computer Vision Workshops (WACVW)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Winter Applications of Computer Vision Workshops (WACVW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACVW.2019.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Due to participation in TRECVID ActEV[1] competition, we conduct research on temporal activity recognition. In this paper, we propose a system for activity detection and localize detected activities temporally in extended videos. Our system firstly detects objects in video frames. Secondly, we use position information of detected object, as input to the object tracking model, which can obtain motion information of multiple objects in consecutive frames. Lastly, we input consecutive video frames containing only detected objects into 3D Convolutional Neural Network to achieve features, and 3D CNN is followed by a recurrent neural network for accurately localizing the detected activity.