Arbitrary shape text detection fusing InceptionNeXt and multi-scale attention mechanism

Xianguo Li, Yu Zhang, Yi Liu, Xingchen Yao, Xinyi Zhou
{"title":"Arbitrary shape text detection fusing InceptionNeXt and multi-scale attention mechanism","authors":"Xianguo Li, Yu Zhang, Yi Liu, Xingchen Yao, Xinyi Zhou","doi":"10.1007/s11227-024-06418-w","DOIUrl":null,"url":null,"abstract":"<p>Existing segmentation-based text detection methods generally face the problems of insufficient receptive fields, insufficient text information filtering, and difficulty in balancing detection accuracy and speed, limiting their ability to detect arbitrary-shaped text in complex backgrounds. To address these problems, we propose a new text detection method fusing the pure ConvNet model InceptionNeXt and the multi-scale attention mechanism. Firstly, we propose a text information reinforcement module to fully extract effective text information from features of different scales while preserving spatial position information. Secondly, we construct the InceptionNeXt Block module to compensate for insufficient receptive fields without significantly reducing speed. Finally, the INA-DBNet network structure is designed to fuse local and global features and achieve the balance of accuracy and speed. Experimental results demonstrate the efficacy of our method. Particularly, on the MSRA-TD500 and Total-text datasets, INA-DBNet achieves 91.3% and 86.7% <i>F</i>-measure while maintaining real-time inference speed. Code is available at: https://github.com/yuyu678/INANET.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"79 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06418-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Existing segmentation-based text detection methods generally face the problems of insufficient receptive fields, insufficient text information filtering, and difficulty in balancing detection accuracy and speed, limiting their ability to detect arbitrary-shaped text in complex backgrounds. To address these problems, we propose a new text detection method fusing the pure ConvNet model InceptionNeXt and the multi-scale attention mechanism. Firstly, we propose a text information reinforcement module to fully extract effective text information from features of different scales while preserving spatial position information. Secondly, we construct the InceptionNeXt Block module to compensate for insufficient receptive fields without significantly reducing speed. Finally, the INA-DBNet network structure is designed to fuse local and global features and achieve the balance of accuracy and speed. Experimental results demonstrate the efficacy of our method. Particularly, on the MSRA-TD500 and Total-text datasets, INA-DBNet achieves 91.3% and 86.7% F-measure while maintaining real-time inference speed. Code is available at: https://github.com/yuyu678/INANET.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
融合 InceptionNeXt 和多尺度关注机制的任意形状文本检测
现有的基于分割的文本检测方法普遍面临感受野不足、文本信息过滤不充分、检测精度和速度难以兼顾等问题,限制了其在复杂背景中检测任意形状文本的能力。针对这些问题,我们提出了一种融合纯 ConvNet 模型 InceptionNeXt 和多尺度注意力机制的新文本检测方法。首先,我们提出了文本信息强化模块,在保留空间位置信息的同时,从不同尺度的特征中充分提取有效的文本信息。其次,我们构建了 InceptionNeXt Block 模块,以在不显著降低速度的情况下补偿不足的感受野。最后,我们设计了 INA-DBNet 网络结构,以融合局部和全局特征,实现准确性和速度的平衡。实验结果证明了我们方法的有效性。特别是在 MSRA-TD500 和 Total-text 数据集上,INA-DBNet 在保持实时推理速度的同时,F-measure 分别达到了 91.3% 和 86.7%。代码见:https://github.com/yuyu678/INANET。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A quadratic regression model to quantify certain latest corona treatment drug molecules based on coindices of M-polynomial Data integration from traditional to big data: main features and comparisons of ETL approaches End-to-end probability analysis method for multi-core distributed systems A cloud computing approach to superscale colored traveling salesman problems Approximating neural distinguishers using differential-linear imbalance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1