Text Localization and Script Identification in Natural Scene Images and Videos

2022 International Conference on Connected Systems & Intelligence (CSI) Pub Date : 2022-08-31 DOI:10.1109/CSI54720.2022.9924044

Chandana Udupa, Anusha Upadhyaya, Basanagoud S. Patil, S. Seeri, Prakashgoud Patil, P. Hiremath

{"title":"Text Localization and Script Identification in Natural Scene Images and Videos","authors":"Chandana Udupa, Anusha Upadhyaya, Basanagoud S. Patil, S. Seeri, Prakashgoud Patil, P. Hiremath","doi":"10.1109/CSI54720.2022.9924044","DOIUrl":null,"url":null,"abstract":"Text detection and its script identification in a natural scene image/video has attracted the attention of many researchers over the recent years due to its application in the de-sign of computer vision devices for usage by the visually impaired people, global tourists travelling in unfamiliar tourist places, etc. to facilitate them to understand the textual information displayed on sign boards, bill boards, public notice boards, etc., the objective of the proposed method is detection and localization of multilingual text in a natural scene video image and its corresponding script identification. The texts in three languages, namely, English, Hindi and Kannada, are considered. In the proposed method, CNN based YOLOv5 is used for text detection and localization in real-time videos of natural scene and it is also trained for script identification. The YOLOv5 performance is found to yield an accuracy higher than otherobject detection algorithms. The proposed model is trained witha custom dataset containing video images of natural scenes and istested for different scenarios like texts in different backgrounds, fonts, orientations, resolutions, and disturbances in the images. The experimental results demonstrate the effectiveness and robustness of the proposed method. The performance comparison is done with other methods in the literature.","PeriodicalId":221137,"journal":{"name":"2022 International Conference on Connected Systems & Intelligence (CSI)","volume":"250 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Connected Systems & Intelligence (CSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSI54720.2022.9924044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Text detection and its script identification in a natural scene image/video has attracted the attention of many researchers over the recent years due to its application in the de-sign of computer vision devices for usage by the visually impaired people, global tourists travelling in unfamiliar tourist places, etc. to facilitate them to understand the textual information displayed on sign boards, bill boards, public notice boards, etc., the objective of the proposed method is detection and localization of multilingual text in a natural scene video image and its corresponding script identification. The texts in three languages, namely, English, Hindi and Kannada, are considered. In the proposed method, CNN based YOLOv5 is used for text detection and localization in real-time videos of natural scene and it is also trained for script identification. The YOLOv5 performance is found to yield an accuracy higher than otherobject detection algorithms. The proposed model is trained witha custom dataset containing video images of natural scenes and istested for different scenarios like texts in different backgrounds, fonts, orientations, resolutions, and disturbances in the images. The experimental results demonstrate the effectiveness and robustness of the proposed method. The performance comparison is done with other methods in the literature.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自然场景图像和视频的文本定位与脚本识别

近年来，自然场景图像/视频中的文本检测及其文字识别技术被广泛应用于视障人士、在陌生旅游地点旅游的全球游客等使用的计算机视觉设备的设计，以方便他们理解广告牌、广告牌、公共布告栏等显示的文字信息，引起了许多研究者的关注。该方法的目标是对自然场景视频图像中的多语言文本进行检测和定位，并进行相应的脚本识别。审议了英语、印地语和卡纳达语三种语文的案文。在本文提出的方法中，利用基于CNN的YOLOv5对自然场景实时视频进行文本检测和定位，并对其进行脚本识别训练。YOLOv5性能被发现产生比其他目标检测算法更高的精度。该模型使用包含自然场景视频图像的自定义数据集进行训练，并针对不同场景(如不同背景、字体、方向、分辨率和图像中的干扰)列出不同的文本。实验结果证明了该方法的有效性和鲁棒性。并与文献中其他方法进行了性能比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 International Conference on Connected Systems & Intelligence (CSI)

自引率

0.00%

发文量