Deep Learning and Particle Swarm Optimisation-Based Techniques for Visually Impaired Humans' Text Recognition and Identification

Augmented Human Research Pub Date : 2021-10-29 DOI:10.1007/s41133-021-00051-5

Binay Kumar Pandey, Digvijay Pandey, Subodh Wariya, Gaurav Aggarwal, Rahul Rastogi

{"title":"Deep Learning and Particle Swarm Optimisation-Based Techniques for Visually Impaired Humans' Text Recognition and Identification","authors":"Binay Kumar Pandey, Digvijay Pandey, Subodh Wariya, Gaurav Aggarwal, Rahul Rastogi","doi":"10.1007/s41133-021-00051-5","DOIUrl":null,"url":null,"abstract":"<div><p>Blind people can benefit greatly from a system capable of localising and reading comprehension text embedded in natural scenes and providing useful information that boosts their self-esteem and autonomy in everyday situations. Regardless of the fact that existing optical character recognition programmes seem to be quick and effective, the majority of them are not able to correctly recognise text embedded in usual panorama images. The methodology described in this paper is to localise textual image regions and pre-process them using the naïve Bayesian algorithm. A weighted reading technique is used to generate the correct text data from the complicated image regions. Usually, images hold some disturbance as a result of the fact that filtration is proposed during the early pre-processing step. To restore the image's quality, the input image is processed employing gradient and contrast image methods. Following that, the contrast of the source images would be enhanced using an adaptive image map. The stroke width transform, Gabor’s transform, and weighted naïve Bayesian classifier methodologies have been used in complicated degraded images to segment, feature extraction, and detect textual and non-textual elements. Finally, to identify categorised textual data, the confluence of deep neural networks and particle swarm optimisation is being used. The text in the image is transformed into an acoustic output after identification. The dataset IIIT5K is used for the development portion, and the performance of the suggested come up is evaluated using parameters such as accuracy, recall, precision, and F1-score.</p></div>","PeriodicalId":100147,"journal":{"name":"Augmented Human Research","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Augmented Human Research","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s41133-021-00051-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Blind people can benefit greatly from a system capable of localising and reading comprehension text embedded in natural scenes and providing useful information that boosts their self-esteem and autonomy in everyday situations. Regardless of the fact that existing optical character recognition programmes seem to be quick and effective, the majority of them are not able to correctly recognise text embedded in usual panorama images. The methodology described in this paper is to localise textual image regions and pre-process them using the naïve Bayesian algorithm. A weighted reading technique is used to generate the correct text data from the complicated image regions. Usually, images hold some disturbance as a result of the fact that filtration is proposed during the early pre-processing step. To restore the image's quality, the input image is processed employing gradient and contrast image methods. Following that, the contrast of the source images would be enhanced using an adaptive image map. The stroke width transform, Gabor’s transform, and weighted naïve Bayesian classifier methodologies have been used in complicated degraded images to segment, feature extraction, and detect textual and non-textual elements. Finally, to identify categorised textual data, the confluence of deep neural networks and particle swarm optimisation is being used. The text in the image is transformed into an acoustic output after identification. The dataset IIIT5K is used for the development portion, and the performance of the suggested come up is evaluated using parameters such as accuracy, recall, precision, and F1-score.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度学习和粒子群优化的视障人文本识别技术

盲人可以从一个系统中受益匪浅，该系统能够定位和阅读嵌入自然场景中的文本，并提供有用的信息，增强他们在日常生活中的自尊和自主性。尽管现有的光学字符识别程序似乎快速有效，但大多数程序都无法正确识别嵌入常见全景图像中的文本。本文描述的方法是定位文本图像区域，并使用朴素的贝叶斯算法对其进行预处理。使用加权读取技术从复杂的图像区域生成正确的文本数据。通常，由于在早期预处理步骤中提出了过滤，图像会受到一些干扰。为了恢复图像的质量，使用梯度和对比度图像方法对输入图像进行处理。之后，将使用自适应图像映射来增强源图像的对比度。笔划宽度变换、Gabor变换和加权天真贝叶斯分类器方法已被用于复杂退化图像中，用于分割、特征提取和检测文本和非文本元素。最后，为了识别分类的文本数据，正在使用深度神经网络和粒子群优化的融合。图像中的文本在识别之后被转换为声学输出。数据集IIIT5K用于开发部分，并使用准确性、召回率、精确度和F1分数等参数来评估建议的结果的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Augmented Human Research

自引率

0.00%

发文量