Scene text detection using structured information and an end-to-end trainable generative adversarial networks

IF 3.7 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Analysis and Applications Pub Date : 2024-03-19 DOI:10.1007/s10044-024-01259-y
Palanichamy Naveen, Mahmoud Hassaballah
{"title":"Scene text detection using structured information and an end-to-end trainable generative adversarial networks","authors":"Palanichamy Naveen, Mahmoud Hassaballah","doi":"10.1007/s10044-024-01259-y","DOIUrl":null,"url":null,"abstract":"<p>Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Analysis and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10044-024-01259-y","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用结构化信息和端到端可训练生成式对抗网络进行场景文本检测
由于文字外观、背景和方向的多样性,场景文字检测是一项相当大的挑战。在这种情况下,提高鲁棒性、准确性和效率对于光学字符识别、图像理解和自动驾驶汽车等多种应用至关重要。本文探讨了生成式对抗网络(GAN)与网络变异自动编码器(VAE)的整合,以创建一个强大而有效的文本检测网络。所提出的架构由三个相互关联的模块组成:VAE 模块、GAN 模块和文本检测模块。在此框架中,VAE 模块在生成多样化和可变的文本区域方面发挥着关键作用。随后,GAN 模块对这些区域进行细化和增强,以确保更高的真实性和准确性。然后,文本检测模块负责通过为每个区域分配置信度分数来识别输入图像中的文本区域。整个网络的综合训练包括最小化联合损失函数,其中包括 VAE 损失、GAN 损失和文本检测损失。VAE 损失可确保生成文本区域的多样性,GAN 损失可确保真实性和准确性,而文本检测损失则可确保文本区域的高精度识别。所提出的方法在 VAE 模块中采用了编码器-解码器结构,在 GAN 模块中采用了生成器-鉴别器结构。在不同的数据集(包括 Total-Text、CTW1500、ICDAR 2015、ICDAR 2017、ReCTS、TD500、COCO-Text、SynthText、Street View Text 和 KIAST Scene Text)上进行的严格测试表明,与现有方法相比,所提出的方法具有更优越的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Pattern Analysis and Applications
Pattern Analysis and Applications 工程技术-计算机:人工智能
CiteScore
7.40
自引率
2.60%
发文量
76
审稿时长
13.5 months
期刊介绍: The journal publishes high quality articles in areas of fundamental research in intelligent pattern analysis and applications in computer science and engineering. It aims to provide a forum for original research which describes novel pattern analysis techniques and industrial applications of the current technology. In addition, the journal will also publish articles on pattern analysis applications in medical imaging. The journal solicits articles that detail new technology and methods for pattern recognition and analysis in applied domains including, but not limited to, computer vision and image processing, speech analysis, robotics, multimedia, document analysis, character recognition, knowledge engineering for pattern recognition, fractal analysis, and intelligent control. The journal publishes articles on the use of advanced pattern recognition and analysis methods including statistical techniques, neural networks, genetic algorithms, fuzzy pattern recognition, machine learning, and hardware implementations which are either relevant to the development of pattern analysis as a research area or detail novel pattern analysis applications. Papers proposing new classifier systems or their development, pattern analysis systems for real-time applications, fuzzy and temporal pattern recognition and uncertainty management in applied pattern recognition are particularly solicited.
期刊最新文献
K-BEST subspace clustering: kernel-friendly block-diagonal embedded and similarity-preserving transformed subspace clustering Research on decoupled adaptive graph convolution networks based on skeleton data for action recognition Hidden Markov models with multivariate bounded asymmetric student’s t-mixture model emissions YOLOv7-GCM: a detection algorithm for creek waste based on improved YOLOv7 model LDC-PP-YOLOE: a lightweight model for detecting and counting citrus fruit
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1