Attention-based Text Recognition in the Wild

News. Phi Delta Epsilon Pub Date : 2020-01-01 DOI:10.5220/0009970200420049

Zhi-Chen Yan, Stephanie A. Yu

引用次数: 0

Abstract

Recognizing texts in real-world scenes is an important research topic in computer vision. Many deep learning based techniques have been proposed. Such techniques typically follow an encoder-decoder architecture, and use a sequence of feature vectors as the intermediate representation. In this approach, useful 2D spatial information in the input image may be lost due to vector-based encoding. In this paper, we formulate scene text recognition as a spatiotemporal sequence translation problem, and introduce a novel attention based spatiotemporal decoding framework. We first encode an image as a spatiotemporal sequence, which is then translated into a sequence of output characters using the aforementioned decoder. Our encoding and decoding stages are integrated to form an end-to-end trainable deep network. Experimental results on multiple benchmarks, including IIIT5k, SVT, ICDAR and RCTW-17, indicate that our method can significantly outperform conventional attention frameworks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

野外基于注意力的文本识别

真实场景中的文本识别是计算机视觉领域的一个重要研究课题。许多基于深度学习的技术已经被提出。这种技术通常遵循编码器-解码器架构，并使用一系列特征向量作为中间表示。在这种方法中，由于基于矢量的编码，输入图像中有用的二维空间信息可能会丢失。本文将场景文本识别定义为一个时空序列翻译问题，并引入了一种新的基于注意力的时空解码框架。我们首先将图像编码为时空序列，然后使用上述解码器将其翻译成输出字符序列。我们的编码和解码阶段集成在一起，形成一个端到端可训练的深度网络。在IIIT5k、SVT、ICDAR和RCTW-17等多个基准测试上的实验结果表明，我们的方法显著优于传统的注意力框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

News. Phi Delta Epsilon

自引率

0.00%

发文量

期刊最新文献

GAN-Based LiDAR Intensity Simulation Improving Primate Sounds Classification using Binary Presorting for Deep Learning Towards exploring adversarial learning for anomaly detection in complex driving scenes A Study of Neural Collapse for Text Classification Using Artificial Intelligence to Reduce the Risk of Transfusion Hemolytic Reactions