Jui-Teng Ho, G. Hsu, S. Yanushkevich, M. Gavrilova
{"title":"Outline Generation Transformer for Bilingual Scene Text Recognition","authors":"Jui-Teng Ho, G. Hsu, S. Yanushkevich, M. Gavrilova","doi":"10.23919/MVA57639.2023.10216107","DOIUrl":null,"url":null,"abstract":"We propose the Outline Generation Transformer (OGT) for bilingual Scene Text Recognition (STR). As most STR approaches focus on English, we consider both English and Chinese as Chinese is also a major language, and it is a common scene in many areas/countries where both languages can be seen. The OGT consists of an Outline Generator (OG) and a transformer with a language model embedded. The OG detects the character outline of the text and embeds the outline features into a transformer with the outline-query cross-attention layer to better locate each character and enhance the text recognition performance. The training of OGT has two phases, one is training on synthetic data where the text outline masks are made available, followed by the other training on real data where the text outline masks can only be estimated. The proposed OGT is evaluated on several benchmark datasets and compared with state-of-the-art methods.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 18th International Conference on Machine Vision and Applications (MVA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/MVA57639.2023.10216107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We propose the Outline Generation Transformer (OGT) for bilingual Scene Text Recognition (STR). As most STR approaches focus on English, we consider both English and Chinese as Chinese is also a major language, and it is a common scene in many areas/countries where both languages can be seen. The OGT consists of an Outline Generator (OG) and a transformer with a language model embedded. The OG detects the character outline of the text and embeds the outline features into a transformer with the outline-query cross-attention layer to better locate each character and enhance the text recognition performance. The training of OGT has two phases, one is training on synthetic data where the text outline masks are made available, followed by the other training on real data where the text outline masks can only be estimated. The proposed OGT is evaluated on several benchmark datasets and compared with state-of-the-art methods.