Deep Learning and Particle Swarm Optimisation-Based Techniques for Visually Impaired Humans' Text Recognition and Identification

Binay Kumar Pandey, Digvijay Pandey, Subodh Wariya, Gaurav Aggarwal, Rahul Rastogi
{"title":"Deep Learning and Particle Swarm Optimisation-Based Techniques for Visually Impaired Humans' Text Recognition and Identification","authors":"Binay Kumar Pandey,&nbsp;Digvijay Pandey,&nbsp;Subodh Wariya,&nbsp;Gaurav Aggarwal,&nbsp;Rahul Rastogi","doi":"10.1007/s41133-021-00051-5","DOIUrl":null,"url":null,"abstract":"<div><p>Blind people can benefit greatly from a system capable of localising and reading comprehension text embedded in natural scenes and providing useful information that boosts their self-esteem and autonomy in everyday situations. Regardless of the fact that existing optical character recognition programmes seem to be quick and effective, the majority of them are not able to correctly recognise text embedded in usual panorama images. The methodology described in this paper is to localise textual image regions and pre-process them using the naïve Bayesian algorithm. A weighted reading technique is used to generate the correct text data from the complicated image regions. Usually, images hold some disturbance as a result of the fact that filtration is proposed during the early pre-processing step. To restore the image's quality, the input image is processed employing gradient and contrast image methods. Following that, the contrast of the source images would be enhanced using an adaptive image map. The stroke width transform, Gabor’s transform, and weighted naïve Bayesian classifier methodologies have been used in complicated degraded images to segment, feature extraction, and detect textual and non-textual elements. Finally, to identify categorised textual data, the confluence of deep neural networks and particle swarm optimisation is being used. The text in the image is transformed into an acoustic output after identification. The dataset IIIT5K is used for the development portion, and the performance of the suggested come up is evaluated using parameters such as accuracy, recall, precision, and F1-score.</p></div>","PeriodicalId":100147,"journal":{"name":"Augmented Human Research","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Augmented Human Research","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s41133-021-00051-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Blind people can benefit greatly from a system capable of localising and reading comprehension text embedded in natural scenes and providing useful information that boosts their self-esteem and autonomy in everyday situations. Regardless of the fact that existing optical character recognition programmes seem to be quick and effective, the majority of them are not able to correctly recognise text embedded in usual panorama images. The methodology described in this paper is to localise textual image regions and pre-process them using the naïve Bayesian algorithm. A weighted reading technique is used to generate the correct text data from the complicated image regions. Usually, images hold some disturbance as a result of the fact that filtration is proposed during the early pre-processing step. To restore the image's quality, the input image is processed employing gradient and contrast image methods. Following that, the contrast of the source images would be enhanced using an adaptive image map. The stroke width transform, Gabor’s transform, and weighted naïve Bayesian classifier methodologies have been used in complicated degraded images to segment, feature extraction, and detect textual and non-textual elements. Finally, to identify categorised textual data, the confluence of deep neural networks and particle swarm optimisation is being used. The text in the image is transformed into an acoustic output after identification. The dataset IIIT5K is used for the development portion, and the performance of the suggested come up is evaluated using parameters such as accuracy, recall, precision, and F1-score.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于深度学习和粒子群优化的视障人文本识别技术
盲人可以从一个系统中受益匪浅,该系统能够定位和阅读嵌入自然场景中的文本,并提供有用的信息,增强他们在日常生活中的自尊和自主性。尽管现有的光学字符识别程序似乎快速有效,但大多数程序都无法正确识别嵌入常见全景图像中的文本。本文描述的方法是定位文本图像区域,并使用朴素的贝叶斯算法对其进行预处理。使用加权读取技术从复杂的图像区域生成正确的文本数据。通常,由于在早期预处理步骤中提出了过滤,图像会受到一些干扰。为了恢复图像的质量,使用梯度和对比度图像方法对输入图像进行处理。之后,将使用自适应图像映射来增强源图像的对比度。笔划宽度变换、Gabor变换和加权天真贝叶斯分类器方法已被用于复杂退化图像中,用于分割、特征提取和检测文本和非文本元素。最后,为了识别分类的文本数据,正在使用深度神经网络和粒子群优化的融合。图像中的文本在识别之后被转换为声学输出。数据集IIIT5K用于开发部分,并使用准确性、召回率、精确度和F1分数等参数来评估建议的结果的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Haptic Gamer Suit for Enhancing VR Games Experience Retraction Note: Application on Virtual Reality for Enhanced Education Learning, Military Training and Sports The Impact of Transferring Embodiment and Work Efficiency Between Natural Body and Modular Body Systems Smart Life Saver Jacket: A New Jacket to Support CPR Operation Unraveling the Ethical Conundrum of Artificial Intelligence: A Synthesis of Literature and Case Studies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1