{"title":"Detection of Text from Video with Customized Trained Anatomy","authors":"Manasa Devi Mortha, S. Maddala, V. Raju","doi":"10.1145/3460620.3460623","DOIUrl":null,"url":null,"abstract":"With the influence of diverse architectures like ImageNet, VGGNet, ResNet for detection of objects in images, we are proposing a novel architecture for detection of text in video. It is challenging to detect text candidates due to its nature of properties that varies from normal objects in terms of contours, connectionist, size, scaling to motion occlusion, color contrast, poor illumination, etc. Also, it is not possible to apply the existing architecture for the proposed anatomy with incompatibility in targets, parameters. Hence, working on video takes different path of learning and validation. The proposed architecture reads the temporal data to train the sequence of learning features. These features are fed to periodic connectionist to learn successive features to obtain the text candidate. Later, representation of the features are fed to regional proposal network to obtain the regions of interest by comparing with the ground-truth data followed by pooling the text regions with bounding box and finding the probability of their occurrence. The proposed structure evaluated on an ICDAR 2013 “Text in Video” dataset of different indoor and outdoor videos achieves high detection rates and performed better than labeled features.","PeriodicalId":36824,"journal":{"name":"Data","volume":"22 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2021-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1145/3460620.3460623","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
With the influence of diverse architectures like ImageNet, VGGNet, ResNet for detection of objects in images, we are proposing a novel architecture for detection of text in video. It is challenging to detect text candidates due to its nature of properties that varies from normal objects in terms of contours, connectionist, size, scaling to motion occlusion, color contrast, poor illumination, etc. Also, it is not possible to apply the existing architecture for the proposed anatomy with incompatibility in targets, parameters. Hence, working on video takes different path of learning and validation. The proposed architecture reads the temporal data to train the sequence of learning features. These features are fed to periodic connectionist to learn successive features to obtain the text candidate. Later, representation of the features are fed to regional proposal network to obtain the regions of interest by comparing with the ground-truth data followed by pooling the text regions with bounding box and finding the probability of their occurrence. The proposed structure evaluated on an ICDAR 2013 “Text in Video” dataset of different indoor and outdoor videos achieves high detection rates and performed better than labeled features.