Turkish scene text recognition: Introducing extensive real and synthetic datasets and a novel recognition model

IF 5.4 2区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY Engineering Science and Technology-An International Journal-Jestech Pub Date : 2024-12-01 Epub Date: 2024-11-08 DOI:10.1016/j.jestch.2024.101881

Serdar Yıldız

{"title":"Turkish scene text recognition: Introducing extensive real and synthetic datasets and a novel recognition model","authors":"Serdar Yıldız","doi":"10.1016/j.jestch.2024.101881","DOIUrl":null,"url":null,"abstract":"<div><div>In the advancing field of computer vision, scene text recognition (STR) has been progressively gaining prominence. Despite this progress, the lack of a comprehensive study or a suitable dataset for STR, particularly for languages like Turkish, stands out. Existing datasets, regardless of the language, tend to grapple with issues such as limited sample quantity and high noise levels, which considerably restrict the progression and overall efficacy of STR research and applications. Addressing these shortcomings, we introduce the Turkish Scene Text Recognition (TS-TR) dataset, one of the most substantial STR datasets to date, comprising 7288 text instances. In addition, we propose the Synthetic Turkish Scene Text Recognition (STS-TR) dataset, an enormous collection of 12 million samples created using a novel histogram-based method, more efficient than common synthetic data generation methods. Moreover, we present a novel recognition model, the Masked Vision Transformer for Text Recognition (MViT-TR), which achieves a word accuracy of 94.42% on the challenging TS-TR test dataset, underlining its robustness and performance efficacy. We extend our investigation to the influence of synthetic datasets, the utilization of patch masking, and the function of the position attention module on recognition performance. To foster future STR research, we have made all datasets and source codes publicly available.</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"60 ","pages":"Article 101881"},"PeriodicalIF":5.4000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215098624002672","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/8 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

In the advancing field of computer vision, scene text recognition (STR) has been progressively gaining prominence. Despite this progress, the lack of a comprehensive study or a suitable dataset for STR, particularly for languages like Turkish, stands out. Existing datasets, regardless of the language, tend to grapple with issues such as limited sample quantity and high noise levels, which considerably restrict the progression and overall efficacy of STR research and applications. Addressing these shortcomings, we introduce the Turkish Scene Text Recognition (TS-TR) dataset, one of the most substantial STR datasets to date, comprising 7288 text instances. In addition, we propose the Synthetic Turkish Scene Text Recognition (STS-TR) dataset, an enormous collection of 12 million samples created using a novel histogram-based method, more efficient than common synthetic data generation methods. Moreover, we present a novel recognition model, the Masked Vision Transformer for Text Recognition (MViT-TR), which achieves a word accuracy of 94.42% on the challenging TS-TR test dataset, underlining its robustness and performance efficacy. We extend our investigation to the influence of synthetic datasets, the utilization of patch masking, and the function of the position attention module on recognition performance. To foster future STR research, we have made all datasets and source codes publicly available.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

土耳其场景文本识别：引入大量真实和合成数据集以及新型识别模型

在不断进步的计算机视觉领域，场景文本识别（STR）的地位日益突出。尽管取得了这一进展，但针对 STR（尤其是针对土耳其语等语言）的全面研究或合适数据集的缺乏却十分突出。现有的数据集，不论是哪种语言，往往都存在样本数量有限和噪声水平较高等问题，这在很大程度上限制了 STR 研究和应用的进展及整体效果。针对这些不足，我们推出了土耳其场景文本识别（TS-TR）数据集，这是迄今为止最丰富的 STR 数据集之一，包含 7288 个文本实例。此外，我们还提出了土耳其场景文本合成识别（STS-TR）数据集，这是一个包含 1200 万个样本的庞大数据集，采用了一种基于直方图的新方法，比常见的合成数据生成方法更高效。此外，我们还提出了一个新颖的识别模型--用于文本识别的遮罩视觉变换器（MViT-TR），它在具有挑战性的 TS-TR 测试数据集上实现了 94.42% 的单词准确率，凸显了其鲁棒性和性能功效。我们将研究扩展到合成数据集的影响、补丁屏蔽的利用以及位置注意模块对识别性能的作用。为了促进未来的 STR 研究，我们公开了所有数据集和源代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Engineering Science and Technology-An International Journal-Jestech Materials Science-Electronic, Optical and Magnetic Materials

CiteScore

11.20

自引率

3.50%

发文量

153

审稿时长

22 days

期刊介绍： Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology. The scope of JESTECH includes a wide spectrum of subjects including: -Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing) -Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences) -Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)