Vietnamese Scene Text Detection and Recognition using Deep Learning: An Empirical Study

Nhat Truong Pham, Van Dung Pham, Qui Nguyen-Van, Bao Hung Nguyen, Duc Ngoc Minh Dang, Sy Dzung Nguyen
{"title":"Vietnamese Scene Text Detection and Recognition using Deep Learning: An Empirical Study","authors":"Nhat Truong Pham, Van Dung Pham, Qui Nguyen-Van, Bao Hung Nguyen, Duc Ngoc Minh Dang, Sy Dzung Nguyen","doi":"10.1109/GTSD54989.2022.9989248","DOIUrl":null,"url":null,"abstract":"Scene text detection and recognition are vital challenging tasks in computer vision, which are to detect and recognize sequences of texts in natural scenes. Recently, researchers have investigated a lot of state-of-the-art methods to improve the accuracy and efficiency of text detection and recognition. However, there has been little research on text detection and recognition in natural scenes in Vietnam. In this paper, a deep learning-based empirical investigation of Vietnamese scene text detection and recognition is presented. Firstly, four detection models including differentiable binarization network (DBN), pyramid mask text detector (PMTD), pixel aggregation network (PAN), and Fourier contour embedding network (FCEN), are employed to detect text regions from the images. Then, four text recognition models including convolutional recurrent neural network (CRNN), self-attention text recognition network (SATRN), no-recurrence sequence-to-sequence text recognizer (NRTR), and RobustScanner (RS) are also investigated to recognize the texts. Moreover, data augmentation methods are also applied to enrich data for improving the accuracy and enhancing the performance of scene text detection and recognition. To validate the effectiveness of scene text detection and recognition models, the VinText dataset is employed for evaluation. Empirical results show that PMTD and SATRN achieve the highest scores among the others for text detection and recognition, respectively. For knowledge-sharing, our implementation is publicly available at https://github.com/ThorPham/VN_scene_text_detection_recognition.","PeriodicalId":125445,"journal":{"name":"2022 6th International Conference on Green Technology and Sustainable Development (GTSD)","volume":"2023 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th International Conference on Green Technology and Sustainable Development (GTSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GTSD54989.2022.9989248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Scene text detection and recognition are vital challenging tasks in computer vision, which are to detect and recognize sequences of texts in natural scenes. Recently, researchers have investigated a lot of state-of-the-art methods to improve the accuracy and efficiency of text detection and recognition. However, there has been little research on text detection and recognition in natural scenes in Vietnam. In this paper, a deep learning-based empirical investigation of Vietnamese scene text detection and recognition is presented. Firstly, four detection models including differentiable binarization network (DBN), pyramid mask text detector (PMTD), pixel aggregation network (PAN), and Fourier contour embedding network (FCEN), are employed to detect text regions from the images. Then, four text recognition models including convolutional recurrent neural network (CRNN), self-attention text recognition network (SATRN), no-recurrence sequence-to-sequence text recognizer (NRTR), and RobustScanner (RS) are also investigated to recognize the texts. Moreover, data augmentation methods are also applied to enrich data for improving the accuracy and enhancing the performance of scene text detection and recognition. To validate the effectiveness of scene text detection and recognition models, the VinText dataset is employed for evaluation. Empirical results show that PMTD and SATRN achieve the highest scores among the others for text detection and recognition, respectively. For knowledge-sharing, our implementation is publicly available at https://github.com/ThorPham/VN_scene_text_detection_recognition.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用深度学习的越南场景文本检测和识别:实证研究
场景文本检测与识别是计算机视觉领域中具有挑战性的课题,主要是对自然场景中的文本序列进行检测和识别。近年来,研究者们研究了许多最新的方法来提高文本检测和识别的准确性和效率。然而,关于越南自然场景文本检测与识别的研究却很少。本文对基于深度学习的越南语场景文本检测与识别进行了实证研究。首先,采用可微二值化网络(DBN)、金字塔掩码文本检测器(PMTD)、像素聚合网络(PAN)和傅立叶轮廓嵌入网络(FCEN)四种检测模型对图像中的文本区域进行检测;然后,研究了卷积递归神经网络(CRNN)、自关注文本识别网络(SATRN)、无递归序列到序列文本识别器(NRTR)和鲁棒扫描器(RS)四种文本识别模型对文本进行识别。此外,还采用了数据增强方法来丰富数据,以提高准确率,增强场景文本检测和识别的性能。为了验证场景文本检测和识别模型的有效性,使用VinText数据集进行评估。实证结果表明,PMTD和SATRN分别在文本检测和识别方面取得了最高的分数。为了知识共享,我们的实现可以在https://github.com/ThorPham/VN_scene_text_detection_recognition上公开获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Design a Fuel Battery Operation Model for a Car Application for Training Key Information Extraction from Mobile-Captured Vietnamese Receipt Images using Graph Neural Networks Approach Indoor Mobile Robot Positioning using Sensor Fusion A Steering Strategy for Self-Driving Automobile Systems Based on Lane-Line Detection The Improved Sliding Mode Observer for Sensorless Speed Control of Permanent Magnet Synchronous Motor
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1