Chandana Udupa, Anusha Upadhyaya, Basanagoud S. Patil, S. Seeri, Prakashgoud Patil, P. Hiremath
{"title":"Text Localization and Script Identification in Natural Scene Images and Videos","authors":"Chandana Udupa, Anusha Upadhyaya, Basanagoud S. Patil, S. Seeri, Prakashgoud Patil, P. Hiremath","doi":"10.1109/CSI54720.2022.9924044","DOIUrl":null,"url":null,"abstract":"Text detection and its script identification in a natural scene image/video has attracted the attention of many researchers over the recent years due to its application in the de-sign of computer vision devices for usage by the visually impaired people, global tourists travelling in unfamiliar tourist places, etc. to facilitate them to understand the textual information displayed on sign boards, bill boards, public notice boards, etc., the objective of the proposed method is detection and localization of multilingual text in a natural scene video image and its corresponding script identification. The texts in three languages, namely, English, Hindi and Kannada, are considered. In the proposed method, CNN based YOLOv5 is used for text detection and localization in real-time videos of natural scene and it is also trained for script identification. The YOLOv5 performance is found to yield an accuracy higher than otherobject detection algorithms. The proposed model is trained witha custom dataset containing video images of natural scenes and istested for different scenarios like texts in different backgrounds, fonts, orientations, resolutions, and disturbances in the images. The experimental results demonstrate the effectiveness and robustness of the proposed method. The performance comparison is done with other methods in the literature.","PeriodicalId":221137,"journal":{"name":"2022 International Conference on Connected Systems & Intelligence (CSI)","volume":"250 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Connected Systems & Intelligence (CSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSI54720.2022.9924044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Text detection and its script identification in a natural scene image/video has attracted the attention of many researchers over the recent years due to its application in the de-sign of computer vision devices for usage by the visually impaired people, global tourists travelling in unfamiliar tourist places, etc. to facilitate them to understand the textual information displayed on sign boards, bill boards, public notice boards, etc., the objective of the proposed method is detection and localization of multilingual text in a natural scene video image and its corresponding script identification. The texts in three languages, namely, English, Hindi and Kannada, are considered. In the proposed method, CNN based YOLOv5 is used for text detection and localization in real-time videos of natural scene and it is also trained for script identification. The YOLOv5 performance is found to yield an accuracy higher than otherobject detection algorithms. The proposed model is trained witha custom dataset containing video images of natural scenes and istested for different scenarios like texts in different backgrounds, fonts, orientations, resolutions, and disturbances in the images. The experimental results demonstrate the effectiveness and robustness of the proposed method. The performance comparison is done with other methods in the literature.