Multi-Scale Channel Attention for Chinese Scene Text Recognition

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition Pub Date : 2022-11-17 DOI:10.1145/3581807.3581808

Haiqing Liao, X. Du, Yun Wu, Da-Han Wang

{"title":"Multi-Scale Channel Attention for Chinese Scene Text Recognition","authors":"Haiqing Liao, X. Du, Yun Wu, Da-Han Wang","doi":"10.1145/3581807.3581808","DOIUrl":null,"url":null,"abstract":"Scene text recognition have proven to be highly effective in solving various computer vision tasks. Recently, numerous recognition algorithms based on the encoder-decoder framework have been proposed for handling scene texts with perspective distortion and curve shape. Nevertheless, most of these methods only consider single-scale features while not taking multi-scale features into account. Meanwhile, the existing text recognition methods are mainly used for English texts, whereas ignoring Chinese texts' pivotal role. In this paper, we proposed an end-to-end method to integrate multi-scale features for Chinese scene text recognition (CSTR). Specifically, we adopted and customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture multi-scale features of the input image while simultaneously extending the receptive fields. Moreover, we added Squeeze-and-Excitation Networks (SE) to capture attentional features with global information to improve the performance of CSTR further. The experimental results of the Chinese scene text datasets demonstrate that the proposed method can efficiently mitigate the impacts of the loss of contextual information caused by the text scale varying and outperforms the state-of-the-art approaches.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"13 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Scene text recognition have proven to be highly effective in solving various computer vision tasks. Recently, numerous recognition algorithms based on the encoder-decoder framework have been proposed for handling scene texts with perspective distortion and curve shape. Nevertheless, most of these methods only consider single-scale features while not taking multi-scale features into account. Meanwhile, the existing text recognition methods are mainly used for English texts, whereas ignoring Chinese texts' pivotal role. In this paper, we proposed an end-to-end method to integrate multi-scale features for Chinese scene text recognition (CSTR). Specifically, we adopted and customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture multi-scale features of the input image while simultaneously extending the receptive fields. Moreover, we added Squeeze-and-Excitation Networks (SE) to capture attentional features with global information to improve the performance of CSTR further. The experimental results of the Chinese scene text datasets demonstrate that the proposed method can efficiently mitigate the impacts of the loss of contextual information caused by the text scale varying and outperforms the state-of-the-art approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

中文场景文本识别的多尺度通道关注

场景文本识别已被证明是解决各种计算机视觉任务的高效方法。近年来，人们提出了许多基于编码器-解码器框架的识别算法来处理具有视角失真和曲线形状的场景文本。然而，这些方法大多只考虑单尺度特征，而没有考虑多尺度特征。同时，现有的文本识别方法主要用于英文文本，忽略了中文文本的关键作用。本文提出了一种端到端融合多尺度特征的中文场景文本识别方法。具体来说，我们在骨干网中采用并定制了密集空间金字塔池(DenseASPP)来捕获输入图像的多尺度特征，同时扩展接收野。此外，为了进一步提高CSTR的性能，我们还增加了挤压-激励网络(SE)来捕获具有全局信息的注意力特征。中文场景文本数据集的实验结果表明，该方法能够有效地减轻文本尺度变化带来的上下文信息丢失的影响，优于现有的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Multi-Scale Channel Attention for Chinese Scene Text Recognition Vehicle Re-identification Based on Multi-Scale Attention Feature Fusion Comparative Study on EEG Feature Recognition based on Deep Belief Network VA-TransUNet: A U-shaped Medical Image Segmentation Network with Visual Attention Traffic Flow Forecasting Research Based on Delay Reconstruction and GRU-SVR