通过结构先验和语义先验的多尺度交互实现场景文本图像超分辨率

Zhongjie Zhu;Lei Zhang;Yongqiang Bai;Yuer Wang;Pei Li
{"title":"通过结构先验和语义先验的多尺度交互实现场景文本图像超分辨率","authors":"Zhongjie Zhu;Lei Zhang;Yongqiang Bai;Yuer Wang;Pei Li","doi":"10.1109/TAI.2024.3375836","DOIUrl":null,"url":null,"abstract":"Scene text image superresolution (STISR) aims to enhance the resolution of images containing text within a scene, making the text more readable and easier to recognize. This technique has broad applications in numerous fields such as autonomous driving, document scanning, image retrieval, and so on. However, most existing STISR methods have not fully exploited the multiscale structural and semantic information within scene text images. As a result, the restored text image quality is not sufficient, significantly impacting subsequent tasks such as text detection and recognition. Hence, this article proposes a novel scheme that leverages multiscale structural and semantic priors to efficiently guide text semantic restoration, ultimately yielding high-quality text images. First, a multiscale interaction attention (MSIA) module is designed to capture location-specific details of various-scale structural features and facilitate the recovery of semantic information. Second, a multiscale prior learning module (MSPLM) is developed. Within this module, skip connections are employed among codecs to strengthen both structural and semantic prior features, thereby enhancing the up-sampling and reconstruction capabilities. Finally, building upon the MSPLM, cascaded encoders are connected through residual connections to further enrich the multiscale features and bolster the representational capacity of the prior. Experiments conducted on the standard TextZoom dataset demonstrate that the average recognition accuracies of three evaluators—attentional scene text recognizer (ASTER), convolutional recurrent neural network (CRNN), and multi-object rectified attention network (MORAN)—are 64.4%, 53.5%, and 60.8%, respectively, surpassing most existing methods, including the state-of-the-art ones.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scene Text Image Superresolution Through Multiscale Interaction of Structural and Semantic Priors\",\"authors\":\"Zhongjie Zhu;Lei Zhang;Yongqiang Bai;Yuer Wang;Pei Li\",\"doi\":\"10.1109/TAI.2024.3375836\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scene text image superresolution (STISR) aims to enhance the resolution of images containing text within a scene, making the text more readable and easier to recognize. This technique has broad applications in numerous fields such as autonomous driving, document scanning, image retrieval, and so on. However, most existing STISR methods have not fully exploited the multiscale structural and semantic information within scene text images. As a result, the restored text image quality is not sufficient, significantly impacting subsequent tasks such as text detection and recognition. Hence, this article proposes a novel scheme that leverages multiscale structural and semantic priors to efficiently guide text semantic restoration, ultimately yielding high-quality text images. First, a multiscale interaction attention (MSIA) module is designed to capture location-specific details of various-scale structural features and facilitate the recovery of semantic information. Second, a multiscale prior learning module (MSPLM) is developed. Within this module, skip connections are employed among codecs to strengthen both structural and semantic prior features, thereby enhancing the up-sampling and reconstruction capabilities. Finally, building upon the MSPLM, cascaded encoders are connected through residual connections to further enrich the multiscale features and bolster the representational capacity of the prior. Experiments conducted on the standard TextZoom dataset demonstrate that the average recognition accuracies of three evaluators—attentional scene text recognizer (ASTER), convolutional recurrent neural network (CRNN), and multi-object rectified attention network (MORAN)—are 64.4%, 53.5%, and 60.8%, respectively, surpassing most existing methods, including the state-of-the-art ones.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10473520/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10473520/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

场景文本图像超分辨率(STISR)旨在增强场景中包含文本的图像的分辨率,使文本更易读、更易识别。这项技术在自动驾驶、文档扫描、图像检索等众多领域有着广泛的应用。然而,现有的 STISR 方法大多没有充分利用场景文本图像中的多尺度结构和语义信息。因此,修复后的文本图像质量不高,严重影响了文本检测和识别等后续任务。因此,本文提出了一种新方案,利用多尺度结构和语义先验来有效指导文本语义还原,最终获得高质量的文本图像。首先,设计了一个多尺度交互注意(MSIA)模块,以捕捉不同尺度结构特征的特定位置细节,促进语义信息的恢复。其次,开发了多尺度先验学习模块(MSPLM)。在该模块中,编解码器之间采用跳转连接,以加强结构和语义先验特征,从而增强上采样和重建能力。最后,在 MSPLM 的基础上,通过残差连接将级联编码器连接起来,进一步丰富多尺度特征,增强先验的表征能力。在标准 TextZoom 数据集上进行的实验表明,三个评估器--注意力场景文本识别器(ASTER)、卷积递归神经网络(CRNN)和多对象整流注意力网络(MORAN)--的平均识别准确率分别为 64.4%、53.5% 和 60.8%,超过了大多数现有方法,包括最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Scene Text Image Superresolution Through Multiscale Interaction of Structural and Semantic Priors
Scene text image superresolution (STISR) aims to enhance the resolution of images containing text within a scene, making the text more readable and easier to recognize. This technique has broad applications in numerous fields such as autonomous driving, document scanning, image retrieval, and so on. However, most existing STISR methods have not fully exploited the multiscale structural and semantic information within scene text images. As a result, the restored text image quality is not sufficient, significantly impacting subsequent tasks such as text detection and recognition. Hence, this article proposes a novel scheme that leverages multiscale structural and semantic priors to efficiently guide text semantic restoration, ultimately yielding high-quality text images. First, a multiscale interaction attention (MSIA) module is designed to capture location-specific details of various-scale structural features and facilitate the recovery of semantic information. Second, a multiscale prior learning module (MSPLM) is developed. Within this module, skip connections are employed among codecs to strengthen both structural and semantic prior features, thereby enhancing the up-sampling and reconstruction capabilities. Finally, building upon the MSPLM, cascaded encoders are connected through residual connections to further enrich the multiscale features and bolster the representational capacity of the prior. Experiments conducted on the standard TextZoom dataset demonstrate that the average recognition accuracies of three evaluators—attentional scene text recognizer (ASTER), convolutional recurrent neural network (CRNN), and multi-object rectified attention network (MORAN)—are 64.4%, 53.5%, and 60.8%, respectively, surpassing most existing methods, including the state-of-the-art ones.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.70
自引率
0.00%
发文量
0
期刊最新文献
Table of Contents Front Cover IEEE Transactions on Artificial Intelligence Publication Information Front Cover Table of Contents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1