Turkish scene text recognition: Introducing extensive real and synthetic datasets and a novel recognition model

IF 5.1 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY Engineering Science and Technology-An International Journal-Jestech Pub Date : 2024-11-08 DOI:10.1016/j.jestch.2024.101881
Serdar Yıldız
{"title":"Turkish scene text recognition: Introducing extensive real and synthetic datasets and a novel recognition model","authors":"Serdar Yıldız","doi":"10.1016/j.jestch.2024.101881","DOIUrl":null,"url":null,"abstract":"<div><div>In the advancing field of computer vision, scene text recognition (STR) has been progressively gaining prominence. Despite this progress, the lack of a comprehensive study or a suitable dataset for STR, particularly for languages like Turkish, stands out. Existing datasets, regardless of the language, tend to grapple with issues such as limited sample quantity and high noise levels, which considerably restrict the progression and overall efficacy of STR research and applications. Addressing these shortcomings, we introduce the Turkish Scene Text Recognition (TS-TR) dataset, one of the most substantial STR datasets to date, comprising 7288 text instances. In addition, we propose the Synthetic Turkish Scene Text Recognition (STS-TR) dataset, an enormous collection of 12 million samples created using a novel histogram-based method, more efficient than common synthetic data generation methods. Moreover, we present a novel recognition model, the Masked Vision Transformer for Text Recognition (MViT-TR), which achieves a word accuracy of 94.42% on the challenging TS-TR test dataset, underlining its robustness and performance efficacy. We extend our investigation to the influence of synthetic datasets, the utilization of patch masking, and the function of the position attention module on recognition performance. To foster future STR research, we have made all datasets and source codes publicly available.</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"60 ","pages":"Article 101881"},"PeriodicalIF":5.1000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215098624002672","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

In the advancing field of computer vision, scene text recognition (STR) has been progressively gaining prominence. Despite this progress, the lack of a comprehensive study or a suitable dataset for STR, particularly for languages like Turkish, stands out. Existing datasets, regardless of the language, tend to grapple with issues such as limited sample quantity and high noise levels, which considerably restrict the progression and overall efficacy of STR research and applications. Addressing these shortcomings, we introduce the Turkish Scene Text Recognition (TS-TR) dataset, one of the most substantial STR datasets to date, comprising 7288 text instances. In addition, we propose the Synthetic Turkish Scene Text Recognition (STS-TR) dataset, an enormous collection of 12 million samples created using a novel histogram-based method, more efficient than common synthetic data generation methods. Moreover, we present a novel recognition model, the Masked Vision Transformer for Text Recognition (MViT-TR), which achieves a word accuracy of 94.42% on the challenging TS-TR test dataset, underlining its robustness and performance efficacy. We extend our investigation to the influence of synthetic datasets, the utilization of patch masking, and the function of the position attention module on recognition performance. To foster future STR research, we have made all datasets and source codes publicly available.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
土耳其场景文本识别:引入大量真实和合成数据集以及新型识别模型
在不断进步的计算机视觉领域,场景文本识别(STR)的地位日益突出。尽管取得了这一进展,但针对 STR(尤其是针对土耳其语等语言)的全面研究或合适数据集的缺乏却十分突出。现有的数据集,不论是哪种语言,往往都存在样本数量有限和噪声水平较高等问题,这在很大程度上限制了 STR 研究和应用的进展及整体效果。针对这些不足,我们推出了土耳其场景文本识别(TS-TR)数据集,这是迄今为止最丰富的 STR 数据集之一,包含 7288 个文本实例。此外,我们还提出了土耳其场景文本合成识别(STS-TR)数据集,这是一个包含 1200 万个样本的庞大数据集,采用了一种基于直方图的新方法,比常见的合成数据生成方法更高效。此外,我们还提出了一个新颖的识别模型--用于文本识别的遮罩视觉变换器(MViT-TR),它在具有挑战性的 TS-TR 测试数据集上实现了 94.42% 的单词准确率,凸显了其鲁棒性和性能功效。我们将研究扩展到合成数据集的影响、补丁屏蔽的利用以及位置注意模块对识别性能的作用。为了促进未来的 STR 研究,我们公开了所有数据集和源代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Engineering Science and Technology-An International Journal-Jestech
Engineering Science and Technology-An International Journal-Jestech Materials Science-Electronic, Optical and Magnetic Materials
CiteScore
11.20
自引率
3.50%
发文量
153
审稿时长
22 days
期刊介绍: Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology. The scope of JESTECH includes a wide spectrum of subjects including: -Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing) -Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences) -Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)
期刊最新文献
Entropy generation and heat transfer in Time-Fractional mixed convection of nanofluids in Darcy-Forchheimer porous channel Etching-free fabrication method for silver nanowires-based SERS sensors for enhanced molecule detection AESware: Developing AES-enabled low-power multicore processors leveraging open RISC-V cores with a shared lightweight AES accelerator Sustainability assessment integrating BIM and decision-making for modular slab construction against conventional cast-in-situ 1D model and rule-based calibration strategy to improve the performance of a turbocharged spark ignition engine over the whole engine map
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1