Attention Guidance by Cross-Domain Supervision Signals for Scene Text Recognition

IF 13.7 IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-01-10 DOI:10.1109/TIP.2024.3523799

Fanfu Xue;Jiande Sun;Yaqi Xue;Qiang Wu;Lei Zhu;Xiaojun Chang;Sen-Ching Cheung

{"title":"Attention Guidance by Cross-Domain Supervision Signals for Scene Text Recognition","authors":"Fanfu Xue;Jiande Sun;Yaqi Xue;Qiang Wu;Lei Zhu;Xiaojun Chang;Sen-Ching Cheung","doi":"10.1109/TIP.2024.3523799","DOIUrl":null,"url":null,"abstract":"Despite recent advances, scene text recognition remains a challenging problem due to the significant variability, irregularity and distortion in text appearance and localization. Attention-based methods have become the mainstream due to their superior vocabulary learning and observation ability. Nonetheless, they are susceptible to attention drift which can lead to word recognition errors. Most works focus on correcting attention drift in decoding but completely ignore the error accumulated during the encoding process. In this paper, we propose a novel scheme, called the Attention Guidance by Cross-Domain Supervision Signals for Scene Text Recognition (ACDS-STR), which can mitigate the attention drift at the feature encoding stage. At the heart of the proposed scheme is the cross-domain attention guidance and feature encoding fusion module (CAFM) that uses the core areas of characters to recursively guide attention to learn in the encoding process. With precise attention information sourced from CAFM, we propose a non-attention-based adaptive transformation decoder (ATD) to guarantee decoding performance and improve decoding speed. In the training stage, we fuse manual guidance and subjective learning to learn the core areas of characters, which notably augments the recognition performance of the model. Experiments are conducted on public benchmarks and show the state-of-the-art performance. The source will be available at <uri>https://github.com/xuefanfu/ACDS-STR</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"717-728"},"PeriodicalIF":13.7000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10838318/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Despite recent advances, scene text recognition remains a challenging problem due to the significant variability, irregularity and distortion in text appearance and localization. Attention-based methods have become the mainstream due to their superior vocabulary learning and observation ability. Nonetheless, they are susceptible to attention drift which can lead to word recognition errors. Most works focus on correcting attention drift in decoding but completely ignore the error accumulated during the encoding process. In this paper, we propose a novel scheme, called the Attention Guidance by Cross-Domain Supervision Signals for Scene Text Recognition (ACDS-STR), which can mitigate the attention drift at the feature encoding stage. At the heart of the proposed scheme is the cross-domain attention guidance and feature encoding fusion module (CAFM) that uses the core areas of characters to recursively guide attention to learn in the encoding process. With precise attention information sourced from CAFM, we propose a non-attention-based adaptive transformation decoder (ATD) to guarantee decoding performance and improve decoding speed. In the training stage, we fuse manual guidance and subjective learning to learn the core areas of characters, which notably augments the recognition performance of the model. Experiments are conducted on public benchmarks and show the state-of-the-art performance. The source will be available at https://github.com/xuefanfu/ACDS-STR.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于跨域监督信号的场景文本识别注意力引导

尽管近年来取得了一些进展，但由于文本外观和定位存在显著的可变性、不规则性和畸变，场景文本识别仍然是一个具有挑战性的问题。注意教学法因其优越的词汇学习能力和观察能力而成为主流。然而，他们很容易受到注意力漂移的影响，从而导致单词识别错误。大多数研究都集中在纠正译码过程中的注意漂移，而完全忽略了编码过程中积累的误差。本文提出了一种基于跨域监督信号的场景文本识别注意引导（ACDS-STR）方案，该方案可以缓解特征编码阶段的注意漂移。该方案的核心是跨域注意力引导和特征编码融合模块（CAFM），该模块利用字符的核心区域在编码过程中递归引导注意力学习。为了保证译码性能和提高译码速度，我们提出了一种不基于注意的自适应变换译码器（ATD）。在训练阶段，我们将人工引导和主观学习相结合，学习汉字的核心区域，显著提高了模型的识别性能。实验在公共基准上进行，并显示了最先进的性能。源代码可在https://github.com/xuefanfu/ACDS-STR上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量