MEAN: Multi - Element Attention Network for Scene Text Recognition

Ruijie Yan, Liangrui Peng, Shanyu Xiao, Gang Yao, Jaesik Min
{"title":"MEAN: Multi - Element Attention Network for Scene Text Recognition","authors":"Ruijie Yan, Liangrui Peng, Shanyu Xiao, Gang Yao, Jaesik Min","doi":"10.1109/ICPR48806.2021.9413166","DOIUrl":null,"url":null,"abstract":"Scene text recognition is a challenging problem due to the wide variances in contents, styles, orientations, and image quality of text instances in natural scene images. To learn the intrinsic representation of scene texts, a novel multi-element attention (MEA) mechanism is proposed to exploit geometric structures from local to global levels in feature maps extracted from a scene text image. The MEA mechanism is a generalized form of self-attention technique. The elements in feature maps are taken as the nodes of an undirected graph, and three kinds of adjacency matrices are designed to aggregate information at local, neighborhood and global levels before calculating the attention weights. A multi-element attention network (MEAN) is implemented, which includes a CNN for feature extraction, an encoder with MEA mechanism and a decoder for predicting text codes. Orientational positional encoding is added to feature maps output by the CNN, and a feature vector sequence transformed from the feature maps is used as the input of the encoder. Experimental results show that MEAN has achieved state-of-the-art or competitive performance on seven public English scene text datasets (IIITSk, SVT, IC03, IC13, IC15, SVTP, and CUTE). Further experiments have been conducted on a selected subset of the RCTW Chinese scene text dataset, demonstrating that MEAN can handle horizontal, vertical, and irregular scene text samples.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"24 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 25th International Conference on Pattern Recognition (ICPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPR48806.2021.9413166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Scene text recognition is a challenging problem due to the wide variances in contents, styles, orientations, and image quality of text instances in natural scene images. To learn the intrinsic representation of scene texts, a novel multi-element attention (MEA) mechanism is proposed to exploit geometric structures from local to global levels in feature maps extracted from a scene text image. The MEA mechanism is a generalized form of self-attention technique. The elements in feature maps are taken as the nodes of an undirected graph, and three kinds of adjacency matrices are designed to aggregate information at local, neighborhood and global levels before calculating the attention weights. A multi-element attention network (MEAN) is implemented, which includes a CNN for feature extraction, an encoder with MEA mechanism and a decoder for predicting text codes. Orientational positional encoding is added to feature maps output by the CNN, and a feature vector sequence transformed from the feature maps is used as the input of the encoder. Experimental results show that MEAN has achieved state-of-the-art or competitive performance on seven public English scene text datasets (IIITSk, SVT, IC03, IC13, IC15, SVTP, and CUTE). Further experiments have been conducted on a selected subset of the RCTW Chinese scene text dataset, demonstrating that MEAN can handle horizontal, vertical, and irregular scene text samples.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MEAN:用于场景文本识别的多元素注意网络
由于自然场景图像中的文本实例在内容、样式、方向和图像质量方面存在很大差异,因此场景文本识别是一个具有挑战性的问题。为了学习场景文本的内在表征,提出了一种新的多元素注意(MEA)机制,利用从场景文本图像提取的特征映射从局部到全局的几何结构。MEA机制是自注意技术的一种广义形式。将特征图中的元素作为无向图的节点,设计了三种邻接矩阵,分别在局部、邻域和全局层面进行信息聚合,然后计算关注权。实现了一个多元素注意力网络(MEAN),该网络包括用于特征提取的CNN、具有MEA机制的编码器和用于预测文本代码的解码器。在CNN输出的特征图中加入方向位置编码,并将特征图变换后的特征向量序列作为编码器的输入。实验结果表明,MEAN在七个公共英语场景文本数据集(IIITSk、SVT、IC03、IC13、IC15、SVTP和CUTE)上取得了最先进或具有竞争力的性能。在RCTW中文场景文本数据集的一个子集上进行了进一步的实验,证明了MEAN可以处理水平、垂直和不规则的场景文本样本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Trajectory representation learning for Multi-Task NMRDP planning Semantic Segmentation Refinement Using Entropy and Boundary-guided Monte Carlo Sampling and Directed Regional Search A Randomized Algorithm for Sparse Recovery An Empirical Bayes Approach to Topic Modeling To Honor our Heroes: Analysis of the Obituaries of Australians Killed in Action in WWI and WWII
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1