Towards An Accurate and Effective Printed Document Reader for Visually Impaired People

Anh Phan Viet, Dung Le Duy, Van Anh Tran Thi, Hung Pham Duy, Truong Vu Van, L. Bui
{"title":"Towards An Accurate and Effective Printed Document Reader for Visually Impaired People","authors":"Anh Phan Viet, Dung Le Duy, Van Anh Tran Thi, Hung Pham Duy, Truong Vu Van, L. Bui","doi":"10.1109/KSE56063.2022.9953768","DOIUrl":null,"url":null,"abstract":"This paper introduces a solution to assist visually impaired or blind (VIB) people in independently accessing printed and electronic documents. The highlight of the solution is the cost-effectiveness and accuracy. Extracting texts and reading out to users are performed by a pure smartphone application. To be usable by VIB people, advanced technologies in image and speech processing are leveraged to enhance the user experience and accuracy in converting images to texts. To build accurate optical character recognition (OCR) models with low-quality images, we combine different solutions includings 1) generating a large and balanced dataset with various backgrounds, 2) correcting the distortion and direction, and 3) applying the sequence to sequence model with transformers as the encoder. For ease of use, the text to speech (TTS) model generates voice instructions at every interaction, and the interface is designed and adjusted according to user feedback. A test on a scanned document set has showed the high accuracy of the OCR model with 98,6% by characters, and the fluency of the TTS model. As being indicated in a trial with VIB people, our application can help them read printed documents conveniently, and it is an affordable solution since the popularity of smartphones.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE56063.2022.9953768","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper introduces a solution to assist visually impaired or blind (VIB) people in independently accessing printed and electronic documents. The highlight of the solution is the cost-effectiveness and accuracy. Extracting texts and reading out to users are performed by a pure smartphone application. To be usable by VIB people, advanced technologies in image and speech processing are leveraged to enhance the user experience and accuracy in converting images to texts. To build accurate optical character recognition (OCR) models with low-quality images, we combine different solutions includings 1) generating a large and balanced dataset with various backgrounds, 2) correcting the distortion and direction, and 3) applying the sequence to sequence model with transformers as the encoder. For ease of use, the text to speech (TTS) model generates voice instructions at every interaction, and the interface is designed and adjusted according to user feedback. A test on a scanned document set has showed the high accuracy of the OCR model with 98,6% by characters, and the fluency of the TTS model. As being indicated in a trial with VIB people, our application can help them read printed documents conveniently, and it is an affordable solution since the popularity of smartphones.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为视障人士提供准确有效的打印文件阅读器
本文介绍了一种帮助视障或盲人(VIB)独立访问印刷和电子文档的解决方案。该解决方案的亮点是成本效益和准确性。提取文本并读出给用户是由一个纯智能手机应用程序执行的。为了让VIB人员能够使用,我们利用了图像和语音处理方面的先进技术来提高用户体验和将图像转换为文本的准确性。为了在低质量图像上建立精确的光学字符识别(OCR)模型,我们结合了不同的解决方案,包括1)生成具有不同背景的大型平衡数据集,2)校正失真和方向,以及3)将序列应用于序列模型,并将变压器作为编码器。为了方便使用,文本到语音(TTS)模型在每次交互时都会生成语音指令,并根据用户反馈设计和调整界面。在一个扫描文档集上的测试表明,OCR模型的字符识别率高达98.6%,TTS模型的流畅性较好。在与VIB人的试用中,我们的应用程序可以帮助他们方便地阅读打印文档,这是智能手机普及以来的一种经济实惠的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DWEN: A novel method for accurate estimation of cell type compositions from bulk data samples Polygenic risk scores adaptation for Height in a Vietnamese population Sentiment Classification for Beauty-fashion Reviews An Automated Stub Method for Unit Testing C/C++ Projects Knowledge-based Problem Solving and Reasoning methods
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1