{"title":"中文场景文本识别的多尺度通道关注","authors":"Haiqing Liao, X. Du, Yun Wu, Da-Han Wang","doi":"10.1145/3581807.3581808","DOIUrl":null,"url":null,"abstract":"Scene text recognition have proven to be highly effective in solving various computer vision tasks. Recently, numerous recognition algorithms based on the encoder-decoder framework have been proposed for handling scene texts with perspective distortion and curve shape. Nevertheless, most of these methods only consider single-scale features while not taking multi-scale features into account. Meanwhile, the existing text recognition methods are mainly used for English texts, whereas ignoring Chinese texts' pivotal role. In this paper, we proposed an end-to-end method to integrate multi-scale features for Chinese scene text recognition (CSTR). Specifically, we adopted and customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture multi-scale features of the input image while simultaneously extending the receptive fields. Moreover, we added Squeeze-and-Excitation Networks (SE) to capture attentional features with global information to improve the performance of CSTR further. The experimental results of the Chinese scene text datasets demonstrate that the proposed method can efficiently mitigate the impacts of the loss of contextual information caused by the text scale varying and outperforms the state-of-the-art approaches.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"13 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Scale Channel Attention for Chinese Scene Text Recognition\",\"authors\":\"Haiqing Liao, X. Du, Yun Wu, Da-Han Wang\",\"doi\":\"10.1145/3581807.3581808\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scene text recognition have proven to be highly effective in solving various computer vision tasks. Recently, numerous recognition algorithms based on the encoder-decoder framework have been proposed for handling scene texts with perspective distortion and curve shape. Nevertheless, most of these methods only consider single-scale features while not taking multi-scale features into account. Meanwhile, the existing text recognition methods are mainly used for English texts, whereas ignoring Chinese texts' pivotal role. In this paper, we proposed an end-to-end method to integrate multi-scale features for Chinese scene text recognition (CSTR). Specifically, we adopted and customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture multi-scale features of the input image while simultaneously extending the receptive fields. Moreover, we added Squeeze-and-Excitation Networks (SE) to capture attentional features with global information to improve the performance of CSTR further. The experimental results of the Chinese scene text datasets demonstrate that the proposed method can efficiently mitigate the impacts of the loss of contextual information caused by the text scale varying and outperforms the state-of-the-art approaches.\",\"PeriodicalId\":292813,\"journal\":{\"name\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"volume\":\"13 4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3581807.3581808\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Scale Channel Attention for Chinese Scene Text Recognition
Scene text recognition have proven to be highly effective in solving various computer vision tasks. Recently, numerous recognition algorithms based on the encoder-decoder framework have been proposed for handling scene texts with perspective distortion and curve shape. Nevertheless, most of these methods only consider single-scale features while not taking multi-scale features into account. Meanwhile, the existing text recognition methods are mainly used for English texts, whereas ignoring Chinese texts' pivotal role. In this paper, we proposed an end-to-end method to integrate multi-scale features for Chinese scene text recognition (CSTR). Specifically, we adopted and customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture multi-scale features of the input image while simultaneously extending the receptive fields. Moreover, we added Squeeze-and-Excitation Networks (SE) to capture attentional features with global information to improve the performance of CSTR further. The experimental results of the Chinese scene text datasets demonstrate that the proposed method can efficiently mitigate the impacts of the loss of contextual information caused by the text scale varying and outperforms the state-of-the-art approaches.