在自然场景中通过扩张重组和高效重组进行不规则文本检测

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Multimedia Systems Pub Date : 2024-06-02 DOI:10.1007/s00530-024-01360-6

Liwen Huang, Wenyuan Yang

{"title":"在自然场景中通过扩张重组和高效重组进行不规则文本检测","authors":"Liwen Huang, Wenyuan Yang","doi":"10.1007/s00530-024-01360-6","DOIUrl":null,"url":null,"abstract":"<p>In recent years, scene text detection has brought out broader prospects via growing applied opportunities. Nevertheless, pointing out which detected capability and suitable instantaneity in equilibrium is an essential consideration of irregular text detection. Out of consideration for the trouble, we propose an efficient scene text detector that unites a Dilated Recombined Unit (DRU) and a Efficient Reorganized Unit (ERU), named DENet. In the beginning, input feature information is received into a DR-VanillaNet backbone. Dilated recombined unit is devised to insert into every block of DR-VanillaNet to heighten the connection about distant pixel points. Next, an FPN with efficient reorganized unit tends to exploit feature redundancy and permutate channels partially. Correspondingly, DRU and ERU work on constructive effect for precision with a limited descent of speed. Moreover, a progressive scale expansion is carried forward which maintains the ability to generate the adjacent instances successfully. Multiple experiments on CTW1500, Total-Text benchmark datasets prove that designed model intends to improve precision accompanied by a limited drop of speed. It is specifically indicated that the value of precision on these two datasets reaches 84.29% and 85.30%. And FPS are achieved by 8.6 and 10.9, respectively.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"29 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A irregular text detection via dilated recombination and efficient reorganization on natural scene\",\"authors\":\"Liwen Huang, Wenyuan Yang\",\"doi\":\"10.1007/s00530-024-01360-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In recent years, scene text detection has brought out broader prospects via growing applied opportunities. Nevertheless, pointing out which detected capability and suitable instantaneity in equilibrium is an essential consideration of irregular text detection. Out of consideration for the trouble, we propose an efficient scene text detector that unites a Dilated Recombined Unit (DRU) and a Efficient Reorganized Unit (ERU), named DENet. In the beginning, input feature information is received into a DR-VanillaNet backbone. Dilated recombined unit is devised to insert into every block of DR-VanillaNet to heighten the connection about distant pixel points. Next, an FPN with efficient reorganized unit tends to exploit feature redundancy and permutate channels partially. Correspondingly, DRU and ERU work on constructive effect for precision with a limited descent of speed. Moreover, a progressive scale expansion is carried forward which maintains the ability to generate the adjacent instances successfully. Multiple experiments on CTW1500, Total-Text benchmark datasets prove that designed model intends to improve precision accompanied by a limited drop of speed. It is specifically indicated that the value of precision on these two datasets reaches 84.29% and 85.30%. And FPS are achieved by 8.6 and 10.9, respectively.</p>\",\"PeriodicalId\":51138,\"journal\":{\"name\":\"Multimedia Systems\",\"volume\":\"29 1\",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multimedia Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00530-024-01360-6\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01360-6","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

近年来，场景文本检测的应用机会越来越多，带来了更广阔的前景。然而，在不规则文本检测中，如何确定检测能力和合适的瞬时平衡是一个重要的考虑因素。出于对这一问题的考虑，我们提出了一种结合了稀释重组单元（DRU）和高效重组单元（ERU）的高效场景文本检测器，命名为 DENet。首先，输入的特征信息被接收到 DR-VanillaNet 骨干网。在 DR-VanillaNet 的每个区块中插入扩张重组单元，以加强与远处像素点的连接。接下来，带有高效重组单元的 FPN 往往会利用特征冗余和部分排列通道。相应地，DRU 和 ERU 利用构造效应来提高精度，但速度下降有限。此外，DRU 和 ERU 还进行了渐进式规模扩展，从而保持了成功生成相邻实例的能力。在 CTW1500、Total-Text 基准数据集上进行的多次实验证明，所设计的模型旨在提高精确度，同时限制速度的下降。具体表现在，这两个数据集的精确度值分别达到了 84.29% 和 85.30%。而 FPS 分别达到了 8.6 和 10.9。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A irregular text detection via dilated recombination and efficient reorganization on natural scene

In recent years, scene text detection has brought out broader prospects via growing applied opportunities. Nevertheless, pointing out which detected capability and suitable instantaneity in equilibrium is an essential consideration of irregular text detection. Out of consideration for the trouble, we propose an efficient scene text detector that unites a Dilated Recombined Unit (DRU) and a Efficient Reorganized Unit (ERU), named DENet. In the beginning, input feature information is received into a DR-VanillaNet backbone. Dilated recombined unit is devised to insert into every block of DR-VanillaNet to heighten the connection about distant pixel points. Next, an FPN with efficient reorganized unit tends to exploit feature redundancy and permutate channels partially. Correspondingly, DRU and ERU work on constructive effect for precision with a limited descent of speed. Moreover, a progressive scale expansion is carried forward which maintains the ability to generate the adjacent instances successfully. Multiple experiments on CTW1500, Total-Text benchmark datasets prove that designed model intends to improve precision accompanied by a limited drop of speed. It is specifically indicated that the value of precision on these two datasets reaches 84.29% and 85.30%. And FPS are achieved by 8.6 and 10.9, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Multimedia Systems 工程技术-计算机：理论方法

CiteScore

5.40

自引率

7.70%

发文量

148

审稿时长

4.5 months

期刊介绍： This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.