Hongchao Gao, Yujia Li, Jiao Dai, Xi Wang, Jizhong Han, Ruixuan Li
{"title":"Multi-granularity Deep Local Representations for Irregular Scene Text Recognition","authors":"Hongchao Gao, Yujia Li, Jiao Dai, Xi Wang, Jizhong Han, Ruixuan Li","doi":"10.1145/3446971","DOIUrl":null,"url":null,"abstract":"Recognizing irregular text from natural scene images is challenging due to the unconstrained appearance of text, such as curvature, orientation, and distortion. Recent recognition networks regard this task as a text sequence labeling problem and most networks capture the sequence only from a single-granularity visual representation, which to some extent limits the performance of recognition. In this article, we propose a hierarchical attention network to capture multi-granularity deep local representations for recognizing irregular scene text. It consists of several hierarchical attention blocks, and each block contains a Local Visual Representation Module (LVRM) and a Decoder Module (DM). Based on the hierarchical attention network, we propose a scene text recognition network. The extensive experiments show that our proposed network achieves the state-of-the-art performance on several benchmark datasets including IIIT-5K, SVT, CUTE, SVT-Perspective, and ICDAR datasets under shorter training time.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 18"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3446971","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IMS transactions on data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3446971","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Recognizing irregular text from natural scene images is challenging due to the unconstrained appearance of text, such as curvature, orientation, and distortion. Recent recognition networks regard this task as a text sequence labeling problem and most networks capture the sequence only from a single-granularity visual representation, which to some extent limits the performance of recognition. In this article, we propose a hierarchical attention network to capture multi-granularity deep local representations for recognizing irregular scene text. It consists of several hierarchical attention blocks, and each block contains a Local Visual Representation Module (LVRM) and a Decoder Module (DM). Based on the hierarchical attention network, we propose a scene text recognition network. The extensive experiments show that our proposed network achieves the state-of-the-art performance on several benchmark datasets including IIIT-5K, SVT, CUTE, SVT-Perspective, and ICDAR datasets under shorter training time.