{"title":"Deep Learning and Particle Swarm Optimisation-Based Techniques for Visually Impaired Humans' Text Recognition and Identification","authors":"Binay Kumar Pandey, Digvijay Pandey, Subodh Wariya, Gaurav Aggarwal, Rahul Rastogi","doi":"10.1007/s41133-021-00051-5","DOIUrl":null,"url":null,"abstract":"<div><p>Blind people can benefit greatly from a system capable of localising and reading comprehension text embedded in natural scenes and providing useful information that boosts their self-esteem and autonomy in everyday situations. Regardless of the fact that existing optical character recognition programmes seem to be quick and effective, the majority of them are not able to correctly recognise text embedded in usual panorama images. The methodology described in this paper is to localise textual image regions and pre-process them using the naïve Bayesian algorithm. A weighted reading technique is used to generate the correct text data from the complicated image regions. Usually, images hold some disturbance as a result of the fact that filtration is proposed during the early pre-processing step. To restore the image's quality, the input image is processed employing gradient and contrast image methods. Following that, the contrast of the source images would be enhanced using an adaptive image map. The stroke width transform, Gabor’s transform, and weighted naïve Bayesian classifier methodologies have been used in complicated degraded images to segment, feature extraction, and detect textual and non-textual elements. Finally, to identify categorised textual data, the confluence of deep neural networks and particle swarm optimisation is being used. The text in the image is transformed into an acoustic output after identification. The dataset IIIT5K is used for the development portion, and the performance of the suggested come up is evaluated using parameters such as accuracy, recall, precision, and F1-score.</p></div>","PeriodicalId":100147,"journal":{"name":"Augmented Human Research","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Augmented Human Research","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s41133-021-00051-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Blind people can benefit greatly from a system capable of localising and reading comprehension text embedded in natural scenes and providing useful information that boosts their self-esteem and autonomy in everyday situations. Regardless of the fact that existing optical character recognition programmes seem to be quick and effective, the majority of them are not able to correctly recognise text embedded in usual panorama images. The methodology described in this paper is to localise textual image regions and pre-process them using the naïve Bayesian algorithm. A weighted reading technique is used to generate the correct text data from the complicated image regions. Usually, images hold some disturbance as a result of the fact that filtration is proposed during the early pre-processing step. To restore the image's quality, the input image is processed employing gradient and contrast image methods. Following that, the contrast of the source images would be enhanced using an adaptive image map. The stroke width transform, Gabor’s transform, and weighted naïve Bayesian classifier methodologies have been used in complicated degraded images to segment, feature extraction, and detect textual and non-textual elements. Finally, to identify categorised textual data, the confluence of deep neural networks and particle swarm optimisation is being used. The text in the image is transformed into an acoustic output after identification. The dataset IIIT5K is used for the development portion, and the performance of the suggested come up is evaluated using parameters such as accuracy, recall, precision, and F1-score.