Vietnamese Scene Text Detection and Recognition using Deep Learning: An Empirical Study

2022 6th International Conference on Green Technology and Sustainable Development (GTSD) Pub Date : 2022-07-29 DOI:10.1109/GTSD54989.2022.9989248

Nhat Truong Pham, Van Dung Pham, Qui Nguyen-Van, Bao Hung Nguyen, Duc Ngoc Minh Dang, Sy Dzung Nguyen

{"title":"Vietnamese Scene Text Detection and Recognition using Deep Learning: An Empirical Study","authors":"Nhat Truong Pham, Van Dung Pham, Qui Nguyen-Van, Bao Hung Nguyen, Duc Ngoc Minh Dang, Sy Dzung Nguyen","doi":"10.1109/GTSD54989.2022.9989248","DOIUrl":null,"url":null,"abstract":"Scene text detection and recognition are vital challenging tasks in computer vision, which are to detect and recognize sequences of texts in natural scenes. Recently, researchers have investigated a lot of state-of-the-art methods to improve the accuracy and efficiency of text detection and recognition. However, there has been little research on text detection and recognition in natural scenes in Vietnam. In this paper, a deep learning-based empirical investigation of Vietnamese scene text detection and recognition is presented. Firstly, four detection models including differentiable binarization network (DBN), pyramid mask text detector (PMTD), pixel aggregation network (PAN), and Fourier contour embedding network (FCEN), are employed to detect text regions from the images. Then, four text recognition models including convolutional recurrent neural network (CRNN), self-attention text recognition network (SATRN), no-recurrence sequence-to-sequence text recognizer (NRTR), and RobustScanner (RS) are also investigated to recognize the texts. Moreover, data augmentation methods are also applied to enrich data for improving the accuracy and enhancing the performance of scene text detection and recognition. To validate the effectiveness of scene text detection and recognition models, the VinText dataset is employed for evaluation. Empirical results show that PMTD and SATRN achieve the highest scores among the others for text detection and recognition, respectively. For knowledge-sharing, our implementation is publicly available at https://github.com/ThorPham/VN_scene_text_detection_recognition.","PeriodicalId":125445,"journal":{"name":"2022 6th International Conference on Green Technology and Sustainable Development (GTSD)","volume":"2023 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th International Conference on Green Technology and Sustainable Development (GTSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GTSD54989.2022.9989248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Scene text detection and recognition are vital challenging tasks in computer vision, which are to detect and recognize sequences of texts in natural scenes. Recently, researchers have investigated a lot of state-of-the-art methods to improve the accuracy and efficiency of text detection and recognition. However, there has been little research on text detection and recognition in natural scenes in Vietnam. In this paper, a deep learning-based empirical investigation of Vietnamese scene text detection and recognition is presented. Firstly, four detection models including differentiable binarization network (DBN), pyramid mask text detector (PMTD), pixel aggregation network (PAN), and Fourier contour embedding network (FCEN), are employed to detect text regions from the images. Then, four text recognition models including convolutional recurrent neural network (CRNN), self-attention text recognition network (SATRN), no-recurrence sequence-to-sequence text recognizer (NRTR), and RobustScanner (RS) are also investigated to recognize the texts. Moreover, data augmentation methods are also applied to enrich data for improving the accuracy and enhancing the performance of scene text detection and recognition. To validate the effectiveness of scene text detection and recognition models, the VinText dataset is employed for evaluation. Empirical results show that PMTD and SATRN achieve the highest scores among the others for text detection and recognition, respectively. For knowledge-sharing, our implementation is publicly available at https://github.com/ThorPham/VN_scene_text_detection_recognition.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用深度学习的越南场景文本检测和识别:实证研究

场景文本检测与识别是计算机视觉领域中具有挑战性的课题，主要是对自然场景中的文本序列进行检测和识别。近年来，研究者们研究了许多最新的方法来提高文本检测和识别的准确性和效率。然而，关于越南自然场景文本检测与识别的研究却很少。本文对基于深度学习的越南语场景文本检测与识别进行了实证研究。首先，采用可微二值化网络(DBN)、金字塔掩码文本检测器(PMTD)、像素聚合网络(PAN)和傅立叶轮廓嵌入网络(FCEN)四种检测模型对图像中的文本区域进行检测;然后，研究了卷积递归神经网络(CRNN)、自关注文本识别网络(SATRN)、无递归序列到序列文本识别器(NRTR)和鲁棒扫描器(RS)四种文本识别模型对文本进行识别。此外，还采用了数据增强方法来丰富数据，以提高准确率，增强场景文本检测和识别的性能。为了验证场景文本检测和识别模型的有效性，使用VinText数据集进行评估。实证结果表明，PMTD和SATRN分别在文本检测和识别方面取得了最高的分数。为了知识共享，我们的实现可以在https://github.com/ThorPham/VN_scene_text_detection_recognition上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 6th International Conference on Green Technology and Sustainable Development (GTSD)

自引率

0.00%

发文量