Towards An Accurate and Effective Printed Document Reader for Visually Impaired People

2022 14th International Conference on Knowledge and Systems Engineering (KSE) Pub Date : 2022-10-19 DOI:10.1109/KSE56063.2022.9953768

Anh Phan Viet, Dung Le Duy, Van Anh Tran Thi, Hung Pham Duy, Truong Vu Van, L. Bui

{"title":"Towards An Accurate and Effective Printed Document Reader for Visually Impaired People","authors":"Anh Phan Viet, Dung Le Duy, Van Anh Tran Thi, Hung Pham Duy, Truong Vu Van, L. Bui","doi":"10.1109/KSE56063.2022.9953768","DOIUrl":null,"url":null,"abstract":"This paper introduces a solution to assist visually impaired or blind (VIB) people in independently accessing printed and electronic documents. The highlight of the solution is the cost-effectiveness and accuracy. Extracting texts and reading out to users are performed by a pure smartphone application. To be usable by VIB people, advanced technologies in image and speech processing are leveraged to enhance the user experience and accuracy in converting images to texts. To build accurate optical character recognition (OCR) models with low-quality images, we combine different solutions includings 1) generating a large and balanced dataset with various backgrounds, 2) correcting the distortion and direction, and 3) applying the sequence to sequence model with transformers as the encoder. For ease of use, the text to speech (TTS) model generates voice instructions at every interaction, and the interface is designed and adjusted according to user feedback. A test on a scanned document set has showed the high accuracy of the OCR model with 98,6% by characters, and the fluency of the TTS model. As being indicated in a trial with VIB people, our application can help them read printed documents conveniently, and it is an affordable solution since the popularity of smartphones.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE56063.2022.9953768","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper introduces a solution to assist visually impaired or blind (VIB) people in independently accessing printed and electronic documents. The highlight of the solution is the cost-effectiveness and accuracy. Extracting texts and reading out to users are performed by a pure smartphone application. To be usable by VIB people, advanced technologies in image and speech processing are leveraged to enhance the user experience and accuracy in converting images to texts. To build accurate optical character recognition (OCR) models with low-quality images, we combine different solutions includings 1) generating a large and balanced dataset with various backgrounds, 2) correcting the distortion and direction, and 3) applying the sequence to sequence model with transformers as the encoder. For ease of use, the text to speech (TTS) model generates voice instructions at every interaction, and the interface is designed and adjusted according to user feedback. A test on a scanned document set has showed the high accuracy of the OCR model with 98,6% by characters, and the fluency of the TTS model. As being indicated in a trial with VIB people, our application can help them read printed documents conveniently, and it is an affordable solution since the popularity of smartphones.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为视障人士提供准确有效的打印文件阅读器

本文介绍了一种帮助视障或盲人(VIB)独立访问印刷和电子文档的解决方案。该解决方案的亮点是成本效益和准确性。提取文本并读出给用户是由一个纯智能手机应用程序执行的。为了让VIB人员能够使用，我们利用了图像和语音处理方面的先进技术来提高用户体验和将图像转换为文本的准确性。为了在低质量图像上建立精确的光学字符识别(OCR)模型，我们结合了不同的解决方案，包括1)生成具有不同背景的大型平衡数据集，2)校正失真和方向，以及3)将序列应用于序列模型，并将变压器作为编码器。为了方便使用，文本到语音(TTS)模型在每次交互时都会生成语音指令，并根据用户反馈设计和调整界面。在一个扫描文档集上的测试表明，OCR模型的字符识别率高达98.6%，TTS模型的流畅性较好。在与VIB人的试用中，我们的应用程序可以帮助他们方便地阅读打印文档，这是智能手机普及以来的一种经济实惠的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 14th International Conference on Knowledge and Systems Engineering (KSE)

自引率

0.00%

发文量

期刊最新文献

DWEN: A novel method for accurate estimation of cell type compositions from bulk data samples Polygenic risk scores adaptation for Height in a Vietnamese population Sentiment Classification for Beauty-fashion Reviews An Automated Stub Method for Unit Testing C/C++ Projects Knowledge-based Problem Solving and Reasoning methods